NCBI Sequence Read Archive (SRA) Submission (Workflow Tutorial)

NCBI Sequence Read Archive (SRA) Submission (Workflow Tutorial)


Overview

This workflow enables CyVerse users to make submissions to the NCBI Sequence Read Archive (SRA).  Submissions instructions include compressed sequenced files (FASTQ.gz, SFF.gz, and BAM.gz) and an XML metadata file, organized into a submission package.  If you need to submit an alternative file format (HD5, SOLiD, and SRF) please submit a question to the CyVerse Ask forum shown below.

How to get help

  • For help interpreting submission errors in SRA notification emails, email the SRA help desk at sra@ncbi.nlm.nih.gov.

  • For help with issues within the CyVerse Discovery Environment, or to provide feedback, visit the CyVerse Ask forum at http://ask.cyverse.org/questions/.

Before You Start

Before You Start: Carefully read this tutorial.

Before You Start: Review the example Input and output data and metadata for this tutorial in the Discovery Environment Data window in  Community Data -> iplantcollaborative -> example_data -> SRA_submission.

Before You Start: You must have an NCBI account to submit. You can obtain an NCBI account here.

Before You Start: You must have used your NCBI account credentials to log into the SRA submitter system at least once to submit from CyVerse. To ensure that you have logged into the SRA submitter system:  go to the the  SRA homepage , click the tab at the top of the page labeled 'Submit', click the link 'NCBI PDA - NCBI Primary Data Submitters', authenticate if needed.

Before You Start: Be aware that submission is not complete until you receive final notification from the SRA that your data have been received, processed, and will be released on the specified date.

General Submission Steps and Important Information


Step 1 -  Upload compressed sequence files into the CyVerse Discovery Environment (DE).

  • For instructions on managing data / metadata and running analyses in the CyVerse Discovery Environment (DE), see the DE manual.

Step 2 -  Create submission package folders and add compressed sequence files.  The submission package is created using tools in the DE.  Submission Packages have three levels: BioProject, BioSample, Library.  Package organization is similar to the SRA organization detailed in the NCBI Quick Start Guide.  Within the DE data and metadata for SRA-defined ‘Experiments’ and ‘Runs’ are part of the ‘Library level of the submission package’. 

  • Each submission can either create a BioProject or add BioSamples to an existing BioProject.

  • Only one BioProject can be created or updated per submission.

  • Sequence files (FASTQ, SFF, or BAM) must be compressed before submission (e.g., FASTQ.gz, SFF.gz, or BAM.gz).

  • Library folders in the submission package must contain only compressed sequence files

  • Library folders may contain multiple compressed sequence files.  One or more for multiple single-end sequencing libraries, and 2 files for paired-end libraries.

Step 3 -  Add metadata to every folder in the submission package.  BioProject, BioSample, and Library metadata are entered using metadata templates in the DE.  After all metadata has been added, save a single metadata file from the BioProject-level folder.

  • Only submission package folders have metadata. Do not add metadata to the compressed sequence files.

  • Metadata entry is minimized using the ‘Metadata Copy’ function in the DE.

  • Use the Metadata Term Guide in the DE for explanations of each metadata term.  The guide is located within each template.

  • 3 metadata templates will be used to add metadata to the submission package: BioProject, BioSample, and Library.

    • For the BioProject Folder select one of these metadata templates: NCBI BioProject Creation, NCBI BioProject Update.

    • For the BioSample Folder(s) select one of these metadata templates: NCBI BioSample - Beta-lactamase, NCBI BioSample - Human, NCBI BioSample - Invertebrate, NCBI BioSample - Metagenome of Environmental, NCBI BioSample - Microbe, NCBI BioSample - Model Organism / Animal, NCBI BioSample - Pathogen Clinical / Host-Associated, NCBI BioSample - Pathogen Env / Food / Other, NCBI BioSample - Plant, NCBI BioSample - Virus.

    • For the Library Folder(s) use the metadata template: NCBI SRA Library.

  • If you plan to submit a large number of BioSamples and/or Libraries, see the documentation for adding metadata templates in bulk.

  • When entering a contact email on the BioProject metadata template, enter the email address associated with your NCBI account or you will not receive SRA email notifications on the status of your submission.

  • See http://www.ncbi.nlm.nih.gov/biosample/docs/packages/ for help determining the appropriate BioSample type for your data.

  • If you require  BioSample Templates for variants of MIMS, MIGS, or MIMARKS data, please make the request at http://ask.cyverse.org/questions/.

  • Any change to folder names, file names, or metadata requires that you save a new metadata file before submission.

Step 4 - In a 2-stage process, select the appropriate SRA Submission App to first validate the submission package and then, after successful validation, to submit to the SRA.  For validation, the App will attempt to create a submission.xml metadata file for use by the SRA system based on the metadata entered into the templates, but will not transfer any files to the SRA.  For submission, the App will both create the submission.xml metadata file and transfer it and all compressed sequence files to the SRA.

  • The App chosen must match the BioProject metadata template for BioProject Creation or BioProject Update.

  • The same App will be run twice, once for validation, and once for submission.

  • If you made any changes to the submission package contents or file/folder names or metadata since last saving the BioProject metadata file remember to resave the BioProject metadata file before running an App.

  • The information buttons in the App 

    (to the left of the app name in the Apps list) provide important details.

  • The Validation stage is optional but may highly reduce errors detected by the SRA.  This is suggested for first time users.

  • For either validation or submission, if the App fails and no submission.xml file is created, there are one or more errors in the submission package.  See the Analysis log files (especially condor-stderr-0) for information to assist with error correction. After you correct each error, be sure to recreate the metadata file and revalidate it. The analysis will fail with the first error it encounters, so this step may need to be run multiple times.

  • Successful validation within the DE does not guarantee that the SRA will not detect additional errors.  

  • No analyses are performed by the App.  Metadata will be aggregated into the submission.xml file (Validation and Submission stages) and the package will be transferred to the SRA (Submission Stage).

Step 5 -  The submission package will be validated by the SRA system and email notifications will be sent by the SRA to the contact email added in the BioProject metadata to confirm successful submission, or to communicate submission errors.

  • SRA processing may take 72 hours (or longer) depending on the load on their systems.

Step 6 -  If error correction and resubmission are needed, the SRA-generated error report can be retrieved with the ' NCBI SRA Submission Report Retrieval' App.  Corrections to the submission package can be made within the DE, and resubmission follows the same process.  

  •  During error correction, only make changes to SRA-detected errors.  All other changes will be ignored by the SRA during resubmission.  If additional changes are required, they can be made using the NCBI website after successful submission.

Video Overviews

*Once a video is playing, you can use the YouTube Settings Gear button to adjust video resolution.

 

Detailed Submission Steps

Unable to render {include} The included page could not be found.