SubmittingToSRA

From Wiki | Meyer Lab
Jump to: navigation, search

Objective

Before publishing any research using high-throughput DNA sequences, the sequences have to be archived at NCBI's SRA.

The process is not at all straightforward. Here are instructions.

Step by step instructions

  • Locate the reads you'll be archiving.
  1. Make sure these are exactly the same set of reads you used to generate the results you're publishing. Triple check this.
  2. Decide whether to use submit raw reads or processed (HQ) reads. Either one is OK, but make sure you know which you have archived and describe them appropriately in the SRA record.
  3. It will make your life easier if you copy your reads, in compressed (*.gz) form, to a new directory, ideally with simple names (e.g. "sample1a.fastq.gz") corresponding to the labels you'd use for each sample in the lab. Make sure to remove this temporary directory after youve finished uploading them to SRA.


  • Log into SRA
  1. Go to the [SRA Submission Portal].
  2. Log in using the lab username and password.
  3. Navigate back to the [SRA Submission Portal].


  • Set up a BioProject
  1. First you need to set up a BioProject. There should be one BioProject for each publication or chapter. A BioProject can contain multiple experiments, each containing multiple samples, each represented by one or more files.
  2. Click the "BioProject" button from the SRA Submission Portal.
  3. Click "New Submission", and follow the prompts. Think carefully as you enter values for your BioProject, since most fields cannot be changed after it's submitted.
    • On the Submitter tab, change the name but NOT the contact information. Uncheck the box that says "update contact information". You don't want to update contact information.
    • On the General Info page, be sure to change the default Project Title to something more informative. Consider the same level of detail you'd use for the title of a research article.
    • Don't skip the public description. Give it a well worded 3-5 sentences. If you have an abstract for the study already, you might edit that or just paste it directly in.
    • Skip the BioSamples step, unless you really only have a single sample (this is rare).
    • Click submit and you'll be taken to [this page]. Refresh this until you see that the BioProject has finished processing and has been assigned an accession (PRJ*). Copy this accession for the next step.


  • Set up one or more BioSamples
  1. Next you'll need to set up one or more batches of BioSamples describing all biological samples used in your study.
  2. Go back to the [SRA Submission Portal]. Click on BioSamples.
  3. Download the batch submission template and complete it very carefully. Read and follow the instructions.
  4. Very important: Add to the template a column called "code", and copy and paste your sample IDs into that column. This will allow your file to pass a built in check that ensures there is a column with unique information for each sample (beside the sample name etc).
  5. Save your file as text, tab delimited.
  6. Click "New submission" on the [BioSample] page, and follow the prompts. Think carefully as you enter values for your BioSamples, since most fields cannot be changed after submission.
    • On the Submitter tab, change the name but NOT the contact information. Uncheck the box that says "update contact information". You don't want to update contact information.
    • Follow the instructions carefully, and upload your spreadsheet (TSV) describing your samples. Pay careful attention to error messages and address these until it passes all the built in checks.
    • Once your spreadsheet is accepted, submit the BioSamples.


  • Upload your files to SRA
  1. To upload your files to SRA, log into the cluster and navigate to the temporary directory containing the files you want to upload.
  2. Make sure the files are compressed (*.gz) and you have a correct list of the file names.
  3. Go to the [SRA Submission Portal] and click on SRA then FTP upload.
  4. Now you're ready to upload your files.
    • Log into the cluster on the FILES server (files.cgrb.oregonstate.edu) and navigate to the directory containing the reads you want to archive.
    • Make sure the names of your files exactly match what you've described in the metadata file. Check case and extensions, etc.
    • Its ideal to compress these files first, and upload *.fasta.gz files.
    • Its easiest to do this if all your files are in a single temporary directory. Thats what these instructions assume.
    • Follow the instructions shown to conect by FTP and navigate to the right directory for uploading your files to SRA.
    • Dont forget to make a new directory for this batch of uploads. Plan to make a single directory for each SRA record. Give it a short name thats meaningful to you.
    • To upload your files, run this code (inside the FTP prompt, from the correct directory on the cluster). This will take some time, and you should do this while you have a stable connection to the internet.
prompt
mput *.fastq.gz


  • Set up one or more SRA records
  1. Finally, you'll need to set up one SRA record for each of the datasets in your study. e.g. if you conducted RNASeq and RAD genotyping on a set of samples, you should plan to make one SRA record for the RNASeq reads and one for the RAD reads.
  2. Avoid mixing read lengths within a single SRA record. If you have two sets of reads for your samples, and they are different lengths, ideally you should submit each set as a separate SRA record.
  3. To begin an SRA submission go back to the [SRA Submission Portal]. It is critical that you begin the submission from this page, not any other alternatives that may also allow you to start a new submission.
  4. Click on SRA, then "New Submission", and follow the prompts. Think carefully as you enter values for your SRA Record, since most fields cannot be changed after it's submitted.
    • On the Submitter tab, change the name but NOT the contact information. Uncheck the box that says "update contact information". You don't want to update contact information.
    • On General Info, enter the BioProject you previously established.
    • On BioSample Attributes, upload the TSV file you produced above for your collection of BioSamples. If you dont want to include all the samples in this SRA record, you can upload only the subset you want to include here.
    • On the Metadata tab, download an Excel template and complete it based on the instructions in the spreadsheet.
    • Save this as a tab-delimited text file (TSV) and upload it to the Metadata tab.
    • On the Files tab:
      1. Click "I have preloaded files". Choose the directory you made when you uploaded the files by FTP. Make sure the correct number of files are shown.
      2. Click "autocomplete submission" and "continue".
  • Check the status of your record.
  1. Log onto the [SRA UI]
  2. Your record will initially appear in the "Attention" tab, until its finished processing when it will move to Completed.
  3. Find your record and check it to make sure everything is correct. Unfortunately at this stage if you do discover errors you typically have to email SRA to have them fixed.
  4. You arent really done until SRA assigns it an accession number. The SUB* number is NOT an accession number, its a temporary ID for your submission while in progress. It may take 1 or more days to get an accession depending how many errors you've made.
  5. When your record is "released" it becomes publicly visible. Until then, only we can see it. By lab policy: we will make these files public as soon as we have confirmed the details in the record are correct (e.g. correct file names, all samples included, etc.). We do NOT wait until publication to release these.

History

Created 15:59 Aug 05, 2017 By: Admin

Last updated 08:55 Sep 23, 2019 By: Admin