InterProScan_Results_Function
The InterProScan_Results_Function app will parse InterProScan XML result files and generate readable tables and a gene association file (GAF) file that can be used in subsequent GO enrichment analyses and functional data count files. For this app to work, InterProScan must have been run with the GO annotation and pathways annotation parameters checked (default setting).
Quick Start
- To use InterProScan_Results_Function, import your xml output file from InterProScan.Â
Test Data
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> interproscan_results_function.
Output Tables
All tables are tab separated, with multiple values separated by a semi-colon. Tables are txt files that may be opened in text-editors or loaded into Excel.
The tables produced are:
1. *_acc_interpro_counts
This table includes input accessions, number of InterPro IDs for each accession, InterPro IDs assigned to each sequence and the InterPro ID name.
Example:
ENSGALP00000006626Â Â Â 1Â Â Â IPR006121Â Â Â DOMAIN:HeavyMe-assoc_HMA
ENSGALP00000004419Â Â Â 2Â Â Â IPR016135;IPR017986Â Â Â DOMAIN:UBQ-conjugating_enzyme/RWD;DOMAIN:WD40_repeat_dom
2. *_acc_go_counts
This table includes input accessions, the number of GO IDs assigned to each accession and GO ID names. GO IDs are split into BP (Biological Process), MF (Molecular Function) and CC (Cellular Component).
Example:
ENSGALP00000043106Â Â Â 1Â Â Â Â Â Â Â Â Â GO:0008270Â Â Â zinc ion binding
ENSGALP00000006626   2   GO:0030001   metal ion transport   GO:0046872   metal ion binding     Â
ENSGALP00000034620   3   GO:0042773;GO:0055114   ATP synthesis coupled electron transport;oxidation-reduction process   GO:0016651   oxidoreductase activity, acting on NAD(P)H
3. *_acc_pathway_counts
This table includes input accessions, number of pathway IDs for the accession and the patheway names. GMultiple values are separated by a semi-colon.
Example:
ENSGALP00000002985Â Â Â 1Â Â Â Reactome: REACT_14797Â Â Â Signaling by GPCR
ENSGALP00000020373Â Â Â 2Â Â Â KEGG: 00920+2.8.1.1;MetaCyc: PWY-5350Â Â Â Sulfur metabolism;Thiosulfate disproportionation III (rhodanese)
4. *_gaf
This table follows the formatting of a gene association file (gaf) and can be used in GO enrichment analyses. However the exact format that enrichment tools use varies, so please check these requirements prior to use. For more information about the gaf format please see:
http://geneontology.org/page/go-annotation-file-gaf-format-21
5. *_go_counts
This table counts the numbers of sequences assigned to each GO ID so that the user can quickly identify all genes assigned to a particular function.
Example:
GO:0000381   regulation of alternative mRNA splicing, via spliceosome   Biological_Process   1   ENSGALP00000001460
GO:0006421   asparaginyl-tRNA aminoacylation   Biological_Process   2   ENSGALP00000004871;ENSGALP00000027851
6. *_interpro_counts
This table counts the numbers of sequences assigned to each InterPro ID so that the user can quickly identify all genes with a particular motif.
Example:
IPR019495Â Â Â FAMILY:EXOSC1Â Â Â 1Â Â Â ENSGALP00000032597
IPR026622Â Â Â FAMILY:Mxra7Â Â Â 2Â Â Â ENSGALP00000002786;ENSGALP00000042423
7. *_pathway_counts
This table counts the numbers of sequences assigned to each Pathway ID so that the user can quickly identify all genes assigned to a pathway.
Example:
KEGG: 00232+1.17.3.2   Caffeine metabolism   1   ENSGALP00000014144
MetaCyc: PWY-6369   Inositol pyrophosphates biosynthesis   2   ENSGALP00000013649;ENSGALP00000007450
8. *.err
This file will list any sequences that were not able to be analyzed by InterProScan. Examples of sequences that will cause an error are sequences with a lrge run of Xs and sequences >10,000 aa.
Â