ASPB Tutorial

  1. Go to http://preview.iplantcollaborative.org/ and log into the Discovery Environment (DE) using either your own account or a guest account provided at the ASPB workshop
  2. Let's import a file into the DE from the Sequence Read Archive. In the interest of time, we'll use a smaller file.
    1. Open a new window and log in to SRA. Go to record SRX010829 and follow the link to Download data for this experiment SRX010829. Copy the link address of the smaller file SRR026996.fastq.gz.
    2. Return to the DE and open the My Data window. Choose File->Import from URL and paste the link to SRR026996.fastq.gz into the URL bar. You can provide a brief text summary of the file as well. When you click Import your upload will begin in the background. For large data sets, this may take several hours, but we expect this file to import rather quickly. You will be notified when your file has uploaded and it will appear in the My Data window.
  3. While this file is importing, let's set up a pre-processing job on some sample data. Open the My Data window and navigate to 'TestData'. You will see a file named s_8_sequence-4.txt. This is an RNAseq lane from Arabidopsis consisting of ~6 million 100 base reads. We happen to know that, in order to preserve strandedness, there was a 3' adapter ligated to the RNA before it was sequenced, so we must remove those bases before we use these reads.
    1. Choose Investigation Type from the top menu bar and select QC Preprocessing (FASTQ files).
    2. Select Process Sequence Data
      1. Under Select Files click 'Browse' to select a data file to process. Please select the aforementioned s_8_sequence-4.txt file.
      2. If the file had barcodes, we could choose to split the file based on them, but it does not.
      3. Under Trim 3' Adapters, uncheck 'I don't have 3' adapters'. You will now create a simple file containing an adapter sequence. Name your file something appropriate like 'ASPB_3_prime'. Paste in the following sequence to the text box: CTGTAGGCACCATCAATCGTATGCCGTCTTCTGCTTG. Set output options to 'Output both clipped and nonclipped sequences'.
      4. Leave all subsequent steps in their default state. Select Launch Job. You will be asked to name your job. Give it a memorable, informative name like 'ASPB RNAseq Preproc 1'. You may also provide a richer text description for the job. Click OK
      5. Your job will now launch in the background and the My Jobs window will appear to show you that your job has been submitted. You will be notified when the job has completed and a data folder will appear in the My Data window with the results of the job. Please note the numeric Job Id column in the My Jobs window – when you submit a job it receives a numeric identifier. The results folder in the My Data window will be named after this job ID. For example, a job with ID 12345 will generate a result folder called Job_12345. You may change the name of these folders after the job has completed in the My Data window.
  4. Assuming the SRR026996.fastq.gz file from maize has uploaded, let's set it up to align to the maize genome.
    1. Choose Investigation Type from the top menu bar and select Variant Detection.
    2. Select Align to Genome
      1. Choose 'Zea mays B73 v2' as the Reference Genome (support for additional species is forthcoming).
      2. Under Select Reads choose 'SRR026996.fastq'. Please note that SRR026996.fastq.gz was decompressed automatically for you when the file was imported into the DE.
      3. Select Launch Job. You will be asked to name your job. Give it a memorable, informative name like 'ASPB Maize Mo17 1'. You may also provide a richer text description for the job. Click OK. As with the pre-processing task, this job will run in the background and you will be notified when it completes. It will deposit its alignment in the My Data window.
      4. Instead of waiting for this to complete, let's take a look at a sample SAM alignment file. Open the My Data window, navigate to 'TestData', and select the file SRR026996-4.zmv2.sam. Click View. A window will pop open with two tabs: 'Description' and 'Preview'. Description can hold a textual description of the file. Preview shows you the first 8kb of the file, so you can get a feel for the type and content of the file.
  5. For details on how to use the iPlant Discovery Environment to accomplish SNP finding and RNAseq analyses, please refer to 'Help - User manual' found under the Menu item.