Noah Street ’21

Icahn School of Medicine at Mount Sinai, New York, NY

Spending this summer as an intern at the Mount Sinai School of Medicine has been an incredible experience. I worked in the lab of Dr. Harm Van Bakel as part of the Pathogen Surveillance Project. The Pathogen Surveillance Project (PSP) is an interdisciplinary team of professionals including epidemiologists, medical doctors, engineers, biologists and other researchers. The Project’s focus is tracking the spread of infection through all of Mount Sinai’s hospitals, as well as a few other partnered hospitals. By studying hospital infections, the team hopes to better understand several aspects of the transmission of pathogens and the pathogens themselves. During the month of August, there was an outbreak of Clostridium difficile within two of the neonatal units; the PSP discovered the reason of the outbreak as an inadequate sterilization of shared medical equipment. By investigating the methods of transmission of pathogens within the hospitals, the team can inform hospital staff as to how to prevent further infection. The PSP also researches the pathogens themselves. For certain pathogens such as Clostridium difficile and Methicillin-resistant Staphylococcus aureus, samples are collected and sequenced using Pac-bio sequencing technologies. The result after assembly is the genome of the particular pathogens related to each infection, which are stored in a database. By comparing these genomes using certain methods such as looking at single nucleotide polymorphisms (or single-base inconsistencies), the team can understand how closely related each infection is down to the single base pair. This high-accuracy analysis of pathogen genomes allows the PSP to determine which patient infected other patients for nosocomial infections. Having the pathogen genome data also allows for the long-term study of the evolution of the pathogens; this is mostly relevant to antibiotic susceptibility profiles of pathogen strains. Antibiotic resistance is encoded in specific segments of each pathogen’s genome; by comparing these segments of pathogen genome over time, the team can understand how the strains may be sharing genetic information through horizontal gene transfer (genetic evolution not by random mutations) and thereby acquiring resistance to certain antibiotics.

The PSP has an immense amount of data that is stored in a database named Pathogen DB. More specifically, the hospital has data on every patient, including their hospital stays, infections, medications, lab tests, diets, procedures, and if their infections had been sampled and sequenced, the genetic data of the pathogen with which they had been infected. The data collection began in 2014 and there have been 38,585 unique patients recorded since. My summer project was to visualize any user-desired subset of this data in a timeline format on the PSP’s website in real time. In order to complete my project, I had to learn several coding languages which I had not yet learned at Williams or in high school. These were SQL, JavaScript, PHP, HTML and CSS. I also used an R package called Shiny to create the first webpage which prompts the user to select the data they wish to visualize.

The timeline visualization website first opens to a form which prompts the user for input. The Shiny script queries the database for all of the possible options, including patient IDs, hospitals, departments, medications, labs, diets, procedures, and pathogens. The user can select from these and specify a time frame for the visualization. The user’s subset of data is stored in a temporary SQLite database which can be further queried by the website if the user selects different viewing options from the timeline itself. The user is then forwarded to another page which is written in PHP, which calls a JavaScript script to render the timeline visualization using a JavaScript library called d3. D3 is a powerful data visualization library for websites. On this page, the user can switch between viewing specific data types and different groupings of data—for example, medications for a specific group of patients or each patient that has been in a specific hospital unit during an outbreak of a specific bacterial infection. The visualization is a helpful tool for the PSP because it allows the team to view any desired subset of data within a very short time from its original recording. As soon as the data is uploaded to the database, it can be visualized. My mentor was able to use the tool to view all of the newborn patients which had been present within the two hospital units during the outbreak of Clostridium difficile in the month of August, and the 
visualization confirmed that the patients that had been infected had been in a close proximity at the same time.

Working in Dr. Van Bakel’s lab with members of the PSP team was a great experience. There are very few hospitals with such a team of interdisciplinary professionals interested in tracking the spread of infection throughout their hospitals. The PSP is a great example of how medical doctors, epidemiologists, researchers and engineers can all have an important role in a clinical setting. Furthermore, the PSP is also a great example of how data collection and analysis can be a powerful tool in almost any professional setting. Dr. Harm Van Bakel has invited me to continue working for his lab and for the Pathogen Surveillance Project remotely with compensation, to continue to improve the timeline visualization tool and to implement additional functionalities. I am very excited to accept his offer and to continue working. I really enjoyed my project this summer and the people I had the pleasure to work with, and in addition, I am glad that my project might be able to help the incredible hospital professionals to further track and understand the spread of pathogens through the hospital.

I am a computer science and economics major and I am very interested in data science. While I can see my possible future career paths going into several fields of work, I will always want to be working with data and my internship at the Mount Sinai School of Medicine has shown me how important this kind of work can potentially be. I am excited to continue my studies with the skills of back-end database usage and management and front-end web development. I want to thank Mr. Chapman and the ’68 Center for Career Exploration for sponsoring my internship. I would not have been able to have such incredible experience and gain so many new skills without your support. Thank you so much. I am looking forward to continuing my timeline visualization project while working with the Pathogen Surveillance Project team, and to continue to learn new skills.