MSKCC Facilities and Other Resources
Workspace and Personnel
The Memorial Sloan Kettering Cancer Center (MSKCC) Bioinformatics Core Facility occupies approximate 1,200 square feet of computer lab space on the 4th and 11th floor of the Zuckerman Research Center (417 East 68th Street New York, NY 10065). The building is located in the heart of New York’s biomedical research district (Weill Cornell Medical Center, Rockefeller University, The Hospital for Special Surgery are within 5 blocks), facilitating interactions with scientific experts in genomics and clinical medicine. The facility, lead by Dr. Nicholas Socci (PI on subaward), is comprised of 13 bioinformatician engineers, research scientists and software developers. It contains all resources necessary to ensure successful operation, including desk space for each staff member, computers, conference rooms and access to a High Performance Compute resource (see below).
Services
The Bioinformatics Core is a proven leader in the management and analysis of Next Generation Sequencing (NGS) data. Since its inception in 2003, Dr. Socci and his team have analyzed nearly 300 TB of sequence data (> several thousands of samples); this has resulted in over 300 publications by MSKCC researchers. The Bioinformatics Core also participates in several large-scale genomics consortia including, the Cancer Genome Atlas (TCGA) and is a core participant on several SPORE and Program Project grants (PO1).
The Bioinformatics Core provides next-generation sequencing (NGS) data analysis, training, and resource development to MSKCC investigators. Pipeline-based and custom bioinformatics analysis solutions are available for many NGS applications; these include variant calling (from exome/genome resequening), differential expression and transcript characterization (from RNA-sequencing), methylation identification (from RRBS) and peak-calling (from Chip-Seq). Training offerings include clinics on portals, programming and bioinformatics; these occur weekly, last for 1-1.5 hours, and often take place in conjunction with the Genomics Core Lab. Lastly, Bioinformatics Core personnel have developed databases and web tools for the entire MSKCC research community; for example, a high-throughput mutation detection pipeline (in collaboration with the Genomics Core Lab and the Beene Translational Oncology Core).
Projects in the Bioinformatics Core frequently rely on genomics data repositories, including TCGA, TARGET, GenBank, and ENCODE. MSKCC has made 21 requests to access TCGA data according to public records; this is the fourth highest request quantity out of 422 institutions. As another example, the facility downloaded GenBank onto local servers and use it as the reference for an alignment pipeline called TAX-BLAST; this tool characterizes the species composition of NGS data files by percent alignment, which helps identify contaminants and QC results. These examples demonstrate the Bioinformatics Core’s reliance on public genomics repositories and their ability to utilize them.
Compute Resources
A robust High Performance Compute (HPC) infrastructure and staff support the Bioinformatics Core. 4 full-time HPC specialists maintain a 5 clusters 2 of which are open to research: one running LSF with 2,048 and the other running Torque with over 3,000 cores respectively and with nearly 2,500TB of high performance Isilon storage, for genomics data management and analysis. Over two dozen commonly used open source and commercial tools are installed on the cluster to enable NGS data analysis.