Submitting your genomic data to NCBI’s database of Genotypes and Phenotypes (dbGaP)
Below are instructions and a collection of resources we’ve put together to help guide you through the dbGaP submission process. Please keep in mind that this is a lengthy process that may take several weeks.
It is our recommendation that the first thing you should do is obtain IRB forms. These forms are required in order to complete the first step in the submission process and can be a rate-limiting step, so we strongly advise that you work to obtain these as soon as you know you will be submitting.
In order to create a project in dbGaP and upload data files to accompany a publication, the IRB needs to review all cases to ensure everyone was properly consented. The MSK IRB will then send an institutional letter to dbGaP to officially register the project.
For all studies:
You must fill out the dbGAP submission form available from the MSK website. The form is under the ‘Genetic Research’ tab, then select ‘Genomic Data Sharing (GDS)’.
The dbGAP submission form is for an IRB request to certify that the data sets being sent are in compliance with the NIH policy before our IO signs the NIH certification letter.
For reporting germline incidental findings uncovered in your lab:
To receive the IRB forms required for dbGaP submission, you may download the forms here under “Category: Protocol Development, GAP Submission Form” or send an email to: IRBPB_Submissions_A@mskcc.org OR IRBPB_Submissions_B@mskcc.org. For assistance, please call the IRB/PB main number: 212-639-7592 For additional questions on the dbGaP submission process from study registration -> data submission -> study release, please refer to this instructional chart here.
If it is NIH funded: send an email to your program officer or genomic program administrator and let them know you will be doing a submission
If it is NOT NIH funded – follow this information.
In order for your dataset to be considered, the steps below should be followed:
- Complete the basic study information form. Instructions for completing the form can be found here.
- Identify the NIH IC that most closely aligns with your research and include the IC name with the basic study information. http://www.nih.gov/icd/
- Indicate whether your study has been published or accepted for publication. If so, please provide documentation (i.e., the publication citation or a copy of any correspondence indicating that an article about the study has been accepted for publication).
- In order for your request to be considered, we will need to receive a certification signed by you and an institutional official. Your institutional official is typically the person who signs your funding applications and is registered in eRA Commons as an authorized Signing Official for your institution. If no one from your institution or organization is registered in eRA Commons, information about this system is available from the eRA Commons website. Also, NIH has developed Points to Consider for IRBs and Institutions to assist institutions in their review and certification of an investigator’s plan for submission of data to dbGaP. The review of your request can be initiated without the certification, but the review process will be expedited if we receive the certification at time of submission. The specific information that needs to be provided is explained in the attached documents. To submit the required information, you may use the sample documents or any other format.
- Submit all information and the certification to GWAS@mail.nih.gov.
Once you have submitted this information, the information will be reviewed by the GWAS Program Administrator Analyst (GPA Analyst) to make sure they have all of the required information. The GPA Analyst will then send the basic study information to the NCI’s Review Subcommittee. This subcommittee will have a week to review the information and comment on the value of making the data broadly available to the scientific community, as well as consider the costs and the administrative burden of accepting the study.
After the subcommittee makes its decision, the GPA Analyst will notify the submitting investigator of the decision. If the study is approved, the GPA Analyst will register the study in the dbGaP system, and notify dbGaP personnel of the new registration. dbGaP staff will contact the submitting investigator directly to guide the investigator through the rest of the submission process.
Below is a sample email notifying the investigator that their study has been registered in dbGaP:
Dear [investigator],
This email is to notify you that your study titled [study title] is successfully registered in dbGaP.
[link]
You are now invited to access the dbGaP submission portal to upload your data files or assign submitter(s) for this study. This invitation will expire in 7 days.
Important notes:
– Please click on the link below to read the instruction on how to login into NCBI:
https://submit.ncbi.nlm.nih.gov/dbgap/instruction/
– Please click on the link below to login and accept this invitation:
[link]
– To log back into the submission portal for the second time and after use the link: https://submit.ncbi.nlm.nih.gov/dbgap/ (and not the one above)
– There is no expiration period on file submission. Once you accept the invitation, you’ll be able to submit your files or assign a submitter at any time after that.
– In order to know that your uploaded files were successfully transferred to dbGaP you should get the message: ”Your submission is complete” after you click on the “Submit” button. Otherwise, you should get an error message indicating the reason for the failed submission.
– Please click on “View consent group” (located in the yellow box on the upper right corner of the study page) to verify that the consent groups are as expected and match the CONSENT variable’s coded values found in the Subject Consent Data Dictionary File. For example, if the consent is General Research Use (MDS), where MDS stands for methods. The coded value for the CONSENT variable is 1=General Research Use (MDS) (GRU-MDS).
To prepare your files, please see the dbGaP Submission guide packet www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetZip.cgi?zip_name=dbGaP_SubmissionPackage.zip. This packet contains templates and instructions on how to format the data files for submission todbGaP. You might want to start with the “Submission_Guide_Instructions.doc” file in the zip file which explains it in details.
The required files for each single study are:
1_dbGaP_StudyConfig*
2a_dbGaP_SubjectPhenotypesDS
2b_dbGaP_SubjectPhenotypesDD
3a_dbGaP_SampleAttributesDS
3b_dbGaP_SampleAttributesDD
4a_dbGaP_SubjectDS*
4b_dbGaP_SubjectDD*
5a_dbGaP_SubjectSampleMappingDS*
5b_dbGaP_SubjectSampleMappingDD*
6a_dbGaP_PedigreeDS (only required if there are related individuals) 6b_dbGaP_PedigreeDD
Please use Submission Portal to submit only Phenotype (listed above) and Genotype files (e.g., .bed, .bim, .fam, .cel, .idat, .vcf ). Also please submit any documents such as consents, forms, questionnaires, protocols, etc. (if available).
If your study involves SRA (Sequence Read Archive) files (e.g., .fastq, .bam) — Please do not submit SRA data through Submission Portal. These data should be submitted through an aspera account. The SRA submitter needs to have the “Study Accession Number” in order to apply for a SRA submission account (aspera account). The “study Accession Number” will be provided to you once your required files (marked with “*”) are submitted through dbGaP Submission portal to us and passed our QCs. The instruction on “how to apply for the SRA submission account” will be provided to you by one of our scientific curators.
We ask that you begin file submission as soon as possible so that the target data delivery and public release date can be met.
Questions regarding File submissions — Please contact dbgap-help: dbgap-sp-help@ncbi.nlm.nih.gov
Questions regarding study registration – Please contact:
Primary GPA (Genomic Program Administrator) Emilie Charlisse Forones Caga-Anan
+1 301 496 0603
charlisse.caga-anan@nih.gov
Kind Regards,
dbGaP staff
A dbGaP accession number will be assigned once dbGaP staff has received the completed study configuration files.
Dear [investigator]
1. This study has been assigned with an accession number, [XXXXX]. Please visit the Preview Site of this study following instructions below and let us know if there is anything you want to change.
========================================
Note: The dbGaP accession assigned to this study is [XXXXX]
INSTRUCTIONS FOR PREVIEWING YOUR SITE
1) Point your browser to [LINK]
2) Sign in to MyNCBI
o Type in your username. Your username is [XXXXX].
o Type in the password. Your password is [XXXXX].
o Be certain to check the box Keep me signed in.
o Click Sign In.
3) If after signing in, you are not automatically directed to your preview site, then either select one of the listed studies or point your browser to [LINK]
4) Note: The above url is for preview purposes only. When your website becomes publicly available, the url will be
[XXXXX]
You should reference this url in your publications. (This link will be activated when the study becomes publicly available.)
========================================
2. If you are ready, please submit the genotype MAF files now.
3. The subject IDs, sample IDs, consents, and SAMPLE_USE that you have submitted have been loaded into dbGaP and made available to BioSample and SRA. Please:
1) read this handbook on the SRA submitting process: http://www.ncbi.nlm.nih.gov/books/NBK242619/
2) Send an email to SRA and Adam Stine (sra@ncbi.nlm.nih.gov; stineaj@ncbi.nlm.nih.gov) to begin an SRA submission.
3) Let SRA know if you will be generating XML or will need a spreadsheet.
I have also included a sample status cgi for you to track when your sequences have been processed by SRA: [link].
Regards,
-email the NIH (sra@ncbi.nlm.nih.gov; stineaj@ncbi.nlm.nih.gov) and tell them we will be doing a spreadsheet submission.
-Someone at the NIH will send back a spreadsheet for the PI to fill out
-Fill it out and return it
-This is the email we received in response:
The template you sent is being used to generate your SRA submission. If you haven’t submitted to dbGaP before you will need to generate a private/public pass key pair to use with the command line utility ascp, which is bundled with aspera connect (http://downloads.asperasoft.com/connect2/). Once you have those send the public key to me. Here is a link for instructions on generating the key pair :
http://www.ncbi.nlm.nih.gov/books/NBK180157/
If you are using a Linux/Unix or OS X computer, the easiest way to create these key is to use the ssh-keygen utility:
ssh-keygen -f ./private.openssh
This will store a private key in the current working directory with the name ‘private.openssh’ as well as a public key with the name ‘private.openssh.pub’. Send me the private.openssh.pub key.