IUTA-1.0 in the Discovery Environment

Alert:

 

The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience.

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to upendra@cyverse.org. Thank you.

Rationale and background:

IUTA: Isoform Usage Two-step Analysis

Liang NiuWeichun HuangDavid M Umbach and Leping Li. BMC Genomics 201415:862

IUTA is an analysis tool of Illumina paired-end RNA-seq data for detecting differential usage of gene transcript isoforms. It first uses the EM algorithm to identify the usage of transcript isoforms for each gene, then tests the difference in the isoform usage between two groups based on the method for composition data analysis. IUTA takes RNA-Seq alignment files (in BAM format) from two groups of samples, together with a gene annotation file (in GTF format) for the related species, to test for differential isoform usage (set of relative abundances of isoforms) for each of the inquired genes. It outputs two tab-delimited files (with header): “estimates.txt” and “p_values.txt”.

Note

 This tool only works will Illumina paired-end RNA-Seq data and no results will be generated from Illumina single end RNA-Seq data.

 

Pre-Requisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Inputs
    1. Mandatory
      1. GTF: Gene annotation file (in GTF format) for the related species
      2. Bam_1: Folder containing BAM files for the replicates of samples in group one
      3. Bam_2: Folder containing BAM files for the replicates of samples in group two
    2. Optional
      1. FLD: Whether to use "empirical" FLD or "normal" FLD. If it is "empirical" (default), the EFLD is used and it is estimated from the data. If it is "normal", a discrete normal distribution is used as FLD. In the latter case, user can specify the mean and the standard deviation (sd) via mean.FL.normal and sd.FL.normal; if user does not specify the mean or/and the standard deviation of the normal FLD, the corresponding estimate(s) from the raw EFLD (i.e., before smoothing) will be used
      2. Test.type: A character vector consists of the test types that the user wants to use for testing differential isoform usage in IUTA. Three types of test are available: "SKK" (default), "CQ" and "KY". The character vector is composed using the three test types, e.g., c("SKK","CQ"), or c("CQ","SKK","KY")
  3. Mandatory output
    1. Output folder name: Output directory name (default is IUTA_output)
  4. Pie compare and Barplot charts
    1. Number of samples: Number of samples in the first group (default is 3)
    2. Gene name: Name of the gene name.
      1. Pie and bar plots will be generated based on the provided gene name. 
      2. If this option is left blank, all the genes in the estimates.txt file will be used for generating compressed pie and bar plots.
    3. Group name: A character vector of the names of the two groups. The first (second) element is the name of the first (second) group. The default names are "1" and "2". Examples (1,2; Sample1,Sample2; etc)
       
Test/sample data

The test data for testing IUTA v1.0 is found in here : /iplant/home/shared/iplantcollaborative/example_data/IUTA.sample.data/


Test run

  1. Open IUTA-1.0 app in DE
  2. Select/drag input file (mm10_kg_sample_IUTA.gtf) and folders (Bam_list_1 and Bam_list_2) into the Inputs section of the IUTA-1.0 app
  3. Leave the default the name of the output folder name (IUTA_output)
  4. Leave the default parameter (empirical) for FLD. For Test type, use SKK,CQ,KY



  5. Leave the default number of samples (3), groups("1","2"), and add Pcmtd1 for the Gene name or leave Gene name text box blank for IUTA to assess all genes in estimate.txt
  6. Click launch analysis 

 

Test Results

Successful execution of the IUTA assessment pipeline will create the following output in the IUTA_output folder

estimates.txtThe tab-delimied text file with path estimates.file (with header) should contain 2 + n1 (number of samples in first group) + n2 (number of samples in second group) columns: the first two columns are the gene name (column 1) and the isoform (column 2); the next n1 columns are the estimates of the relative isoform abundance of the isoform from samples in group one; the last n2 columns are the estimates of the relative isoform abundance of the isoform from samples in group two
p-value.txtA table with 3+1+1+(m− 1)+ 1 columns, where m is the number of tests in test type. The first three columns are “gene” (gene name), “number_of_isoform” (number of isoforms of the gene), “test_sample_size” (number of samples of each group in which the isoform usage can be estimated, separated by comma). The fourth column is “test”, which is the type of test used to calculate the next column “p_value” (either the first test type in test.type, or NA when the test outputs NA). The fifth column is “p_value”, which is the output p-value for the gene by the test in column “test”. The next m − 1 columns corresponding to the p-values by the tests in test.type except the first type of test in test.type

And the following figures are generated...

 More detailed explanation of the input types and output is provided in IUTA manual