...
...
...
...
...
...
...
...
...
...
...
...
...
...
InterproScan 5
InterProScan ver 5.1736.56 is 75 is a HPC-enabled app that runs using TACC computing.
...
- To test InterProScan using a single protein sequence, go here:http://www.ebi.ac.uk/interpro/
- More information about the InterProScan tool is here:http://www.ncbi.nlm.nih.gov/pubmed/?term=24451626
- Information about InterProScan releases is here: https://code.google.com/p/interproscan
Quick Start
To use InterproScan5InterproScan 5.1735.5675, import your data in fasta format.
Panel |
---|
NOTE: some translation programs (e.g. EMBOSS transeq put asterisk characters to indicate a stop codon. These will cause an error in InterProScan. You can use EMBOSS checkseq or etc to remove these characters from your fasta.) |
Inputs:
- Select fasta file (containing no more than 50,000 proteins)file
- Perform look up of corresponding Gene Ontology annotation: check to return GO annotation mappings (default)
- Perform look up of corresponding pathways annotation: check to return pathways mappings (default)
Test Data
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> InterproScan5-44.0.
...
This file contains 15 chicken protein sequences downloaded as a fasta file from NCBI.
Input File(s)
Use chick_test.fasta from the directory above as test input. This is a fasta file of 15 chicken protein sequences downloaded from NCBI as a fasta file.
Parameters Used in App
When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.
- Perform look up of corresponding Gene Ontology annotation: check to return GO annotation mappings (default)
- Perform look up of corresponding pathways annotation: check to return pathways mappings (default)
Output File(s)
In this version of InterProScan, you can retrieve output in any of the following five formats:
- TSV: a simple tab-delimited file format
- XML: the new "IMPACT" XML format (XSD available here).
- GFF3: The GFF 3.0 formatReordered fasta: since InterProScan splits the submitted FASTA file into sections then reassembles them at the end, this file corresponds to the order of sequences in the Interproscan results file
- JSON
- SVG
- HTML
Please note you can only trace protein match positions to the original nucleotide sequence with GFF3 and XML.
...
The InterProScan_Results_Function app will parse InterProScan XML result files and generate a gene association file (GAF) file that can be used in subsequent GO enrichment analyses. For this app to work, InterProScan must have been run with the GO annotation and pathways annotation parameters checked (default setting).
...
Also note: this app also parses the InterProScan XML output to provide additional outputs. For more information on these outputs see the InterProScan Results Function documentation
Tab-separated values format (TSV)
The TSV format presents the match data in columns as follows:
- Protein Accession (e.g. P51587)
- Sequence MD5 digest (e.g. 14086411a2cdf1c4cba63020e1622579)
- Sequence Length (e.g. 3418)
- Analysis (e.g. Pfam / PRINTS / Gene3D)
- Signature Accession (e.g. PF09103 / G3DSA:2.40.50.140)
- Signature Description (e.g. BRCA2 repeat profile)
- Start location
- Stop location
- Score - is the e-value of the match reported by member database method (e.g. 3.1E-52)
- Status - is the status of the match (T: true)
- Date - is the date of the run
- (InterPro annotations - accession (e.g. IPR002093) - optional column; only displayed if -iprscan option is switched on)
- (InterPro annotations - description (e.g. BRCA2 repeat) - optional column; only displayed if -iprscan option is switched on)
- (GO annotations (e.g. GO:0005515) - optional column; only displayed if --goterms option is switched on)
- (Pathways annotations (e.g. REACT_71) - optional column; only displayed if --pathways option is switched on)