ParaAT 2.0

ParaAT 2.0

Community rating: ?????

ParaAT (Parallel Alignment and back-Translation) is a parallel tool that constructs protein-coding DNA alignments for a large number of homologs. ParaAT is well suited for large-scale data analysis in the high-throughput era, providing good scalability and exhibiting high parallel efficiency for computationally demanding tasks.

Notes:

See http://bigd.big.ac.cn/tools/paraat for documentation and https://www.ncbi.nlm.nih.gov/pubmed/22390928 for reference and citation.

Quick Start

To use ParaAT in the Discovery Environment you will need 3 files: a file of homolog clusters, a fasta file of corresponding nucleotide coding sequences, and a fasta file of corresponding peptide sequences. An additional file that specifies the number of processors to be used is provided in the Discovery Environment along with example data.

Example Data

Input test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> paraat -> input

Output test data for this app, generated using App default parameters, appears directly in the Discovery Environment in the Data window under:
Community Data -> iplantcollaborative -> example_data -> paraat -> output

Input File(s)

You will need 3 files: a file of homolog clusters, a fasta file of corresponding nucleotide coding sequences, and a fasta file of corresponding peptide sequences. For the homolog file, ParaAT accepts a tab-delimited text file with each row representing a homologous group. For testing, and for an example of the format for the homolog file, see Community Data -> iplantcollaborative -> example_data -> paraat -> input

Example format for homolog file:

NP_000005 NP_783327 XP_001139819

NP_000006 NP_032699 XP_001146758

NP_000008 NP_031409 XP_001162935
.......

An additional file that specifies the number of processors to be used is provided in the Discovery Environment along with example data.

Parameters Used in App

Multiple Aligner - Select from options within App (tcoffee, mafft, muscle, clustalw2)
Output Format - Select from options within App (fasta, codon, axt, clustal, paml)
Genetic Code - enter the number for the desired option from the list below. The Standard Code is the default.
Remove Aligned Codons with Gaps - select if you want ParaAT to remove from the output.
Remove Mismatched Codons - select if you want ParaAT to remove these from the output.

Output File(s)

There are 5 output format options: axt, fasta, paml, codon, and clustal. By default, ParaAT is set to create output files according to its 'verbose' parameter. Multiple files will be output for each homolog cluster that was successfully processed. cluster.cds_aln.fasta is the primary output file.

cluster.cds - contains nucleotide coding sequences of homolog cluster.
cluster.cds_aln.output_format - Primary output file. Contains cds alignments of homolog cluster.
cluster.pep - Contains amino acid sequences of homolog cluster.
cluster.pep_aln - Contains peptide alignments of homolog cluster.
cluster.dnd - Contains ClustalW2-generated tree data. Only produced if ClustalW2 is selected as the multiple aligner.