ParaAT 2.0
ParaAT 2.0
Community rating: ?????
ParaAT (Parallel Alignment and back-Translation) is a parallel tool that constructs protein-coding DNA alignments for a large number of homologs. ParaAT is well suited for large-scale data analysis in the high-throughput era, providing good scalability and exhibiting high parallel efficiency for computationally demanding tasks.
- See http://bigd.big.ac.cn/tools/paraat for documentation and https://www.ncbi.nlm.nih.gov/pubmed/22390928 for reference and citation.
Quick Start
- To use ParaAT in the Discovery Environment you will need 3 files: a file of homolog clusters, a fasta file of corresponding nucleotide coding sequences, and a fasta file of corresponding peptide sequences. An additional file that specifies the number of processors to be used is provided in the Discovery Environment along with example data.
Example Data
Input test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> paraat -> input |
Input File(s)
You will need 3 files: a file of homolog clusters, a fasta file of corresponding nucleotide coding sequences, and a fasta file of corresponding peptide sequences. For the homolog file, ParaAT accepts a tab-delimited text file with each row representing a homologous group. For testing, and for an example of the format for the homolog file, see Community Data -> iplantcollaborative -> example_data -> paraat -> input
Example format for homolog file:
NP_000005 NP_783327 XP_001139819
NP_000006 NP_032699 XP_001146758
NP_000008 NP_031409 XP_001162935
.......
An additional file that specifies the number of processors to be used is provided in the Discovery Environment along with example data.
Parameters Used in App
- Multiple Aligner - Select from options within App (tcoffee, mafft, muscle, clustalw2)
- Output Format - Select from options within App (fasta, codon, axt, clustal, paml)
- Genetic Code - enter the number for the desired option from the list below. The Standard Code is the default.
- 1. The Standard Code
- 2. The Vertebrate Mitochondrial Code
- 3. The Yeast Mitochondrial Code
- 4. The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
- 5. The Invertebrate Mitochondrial Code
- 6. The Ciliate, Dasycladacean and Hexamita Nuclear Code
- 9. The Echinoderm and Flatworm Mitochondrial Code
- 10. The Euplotid Nuclear Code
- 11. The Bacterial, Archaeal and Plant Plastid Code
- 12. The Alternative Yeast Nuclear Code
- 13. The Ascidian Mitochondrial Code
- 14. The Alternative Flatworm Mitochondrial Code
- 16. Chlorophycean Mitochondrial Code
- 21. Trematode Mitochondrial Code
- 22. Scenedesmus obliquus Mitochondrial Code
- 23. Thraustochytrium Mitochondrial Code
- 24. Pterobranchia Mitochondrial Code
- 25. Candidate Division SR1 and Gracilibacteria Code
- Remove Aligned Codons with Gaps - select if you want ParaAT to remove from the output.
- Remove Mismatched Codons - select if you want ParaAT to remove these from the output.
Output File(s)
There are 5 output format options: axt, fasta, paml, codon, and clustal. By default, ParaAT is set to create output files according to its 'verbose' parameter. Multiple files will be output for each homolog cluster that was successfully processed. cluster.cds_aln.fasta is the primary output file.
- cluster.cds - contains nucleotide coding sequences of homolog cluster.
- cluster.cds_aln.output_format - Primary output file. Contains cds alignments of homolog cluster.
- cluster.pep - Contains amino acid sequences of homolog cluster.
- cluster.pep_aln - Contains peptide alignments of homolog cluster.
- cluster.dnd - Contains ClustalW2-generated tree data. Only produced if ClustalW2 is selected as the multiple aligner.