Gfold 1.1.1 Count
Community rating: ?????
Create a count file for use for Gfold.1.1. Difference Expression. GFOLD is useful when no replicates are available.
gfold - Generalized fold change for ranking differentially expressed genes from RNA-seq data.
"GFOLD is especially useful when no replicate is available. GFOLD generalizes the fold change by considering the posterior distribution of log fold change, such that each gene is assigned a reliable fold change. It overcomes the shortcoming of p-value that measures the significance of whether a gene is differentially expressed under different conditions instead of measuring relative expression changes, which are more interesting in many studies. It also overcomes the shortcoming of fold change that suffers from the fact that the fold change of genes with low read count are not so reliable as that of genes with high read count, even these two genes show the same fold change."
Source: Feng J, Meyer CA, Wang Q, Liu JS, Liu XS, Zhang Y. GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data. Bioinformatics 2012
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> Gfold
The following files will be found:
- To use Gfold 1.1.1 Count, please have the following files available:
- A gene/genome annotation file in these formats: *.GTF ; *.GPF ; or *.BED
- A Sequence Alignment/Map file in: SAM format
- If you have BAM (binary version of SAM file) from Tophat2 - Single End or Tophat2 - Paired End : please use SAMTOOLS-0.1.19 BAM-to-SAM to convert the file into SAM format
Example Test Data
In this example, we will be using the Sample Data from the RNA-Seq Tutorial.
The RNA-Seq reads that we will be working is from Arabidopsis thaliana.
A. Prior to using Gfold 1.1.1 Count:
1) Please follow /wiki/spaces/eot/pages/241585220 - Step 1 with aligning RNA-Seq reads using Tophat2
2) Convert the BAM outputs to SAM format. The files are located within the Data window under:
Community Data -> iplant -> home -> shared -> iplant_training -> intro_rna-seq -> 02_tophat -> bam
(Please run SAMTOOLS-0.1.19 BAM-to-SAM on the following files)
2a) *Input file: WT_rep1.bam
Output file name: WT_rep1.sam
2b) *Input file: hy5_rep1.bam
Output file name: hy5_rep1.sam
B. Gfold 1.1.1 Count:
In this example, please use the following files to generate the count files:
1) The GTF (General Transfer Format) file for Arabidopsis thaliana.
The GTF file for Arabidopsis thaliana can be found in Discovery Environment in the Data window under:
Community Data -> iplant -> home -> shared -> iplant_training -> reference_genomes -> ensembl_14_67 -> GTF -> Arabidopsis_thaliana.TAIR10.14.gtf.
2a) Output file: WT_rep1.sam (from part A step 2a)
2b) Output file: hy5_rep1.sam (from part A step 2b) - need to run Gfold 1.1.1 Count again for this step
3a) Output file name: WT_rep1.read_cnt
3b) Output file name: hy5_rep1.read_cnt - need to run Gfold 1.1.1 Count again for this step
4) Use the generated count files (WT_rep1.read_cnt and hy5_rep1.read_cnt) in Gfold 1.1.1 Difference Expression
Whether is the sequencing data strand specific?
-The Default option is False.
Please select True for this example.
Please Select: True
Expect a file_name.read_cnt as output. For the test case, the output files that will be generated as WT_rep1.read_cnt and hy5_rep1.read_cnt