AAARF v 1.0.1
AAARF v 1.0.1
Community rating: ?????
Assembles pseudomolecules representing high-copy number repeats from genomic sample sequence data. AAARF is intended for use with small, random, sample sequence datasets, not large NGS datasets.
Notes:
- See https://github.com/jdebarry/AAARF and http://www.biomedcentral.com/1471-2105/9/235 for full documentation.
Quick Start
- To use AAARF you will need a fasta file of genomic sample sequences, preferably sorted in order of sample repetitiveness.
Example Data
Input test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> aaarf -> input -> ZU_1000.fasta |
Input File(s)
Fasta file of sample sequences that you are analyzing. Input sequences should be vector trimmed if appropriate. There should be no empty sequences in the input fasta file. All input sequences should be at least 20 nts in size (20 is arbitrary here. You just want the sequences to be larger than the BLAST wordsize). Also, sequences (except for fasta headers) should only contain nucleotide designations (actg). Optional - for best results, it is suggested that you perform an All-by-All BLAST of your sample sequence and order the input fasta file from most to least repetitive prior to running AAARF.
For testing, use
Community Data -> iplantcollaborative -> example_data -> aaarf -> input -> ZU_1000.fasta
Parameters Used in App
BLAST AND MCS CONSTRUCTION PARAMETERS (shown with default values)
maxBlastHits = 100 maximum number of hits used to construct coverage matrix
minBlastMatch = 150; minimum length for a BLAST hit
minBlastIdentity = 0.89; minimum identity for a BLAST hit
minBlastCoverDepth = 2; minimum coverage depth for MCS
minBlastConsenLen = 150; minimum length for MCS
BLAST_e = 1e-25; maximum evalue for BLAST hit
BL2SEQ_e = 1e-10; maximum evalue for BL2SEQ hit
EXTENSION PARAMETERS (shown with default values)
maxExtendHits = 1000; maximum number of BLAST hits used to extend build
minExtendHits = 1; minimum number of BLAST hits used to extend build, must be greater than or equal to = 1
maxExtendLen = 50; maximum extension length (step size)
minExtendLen = 0; minimum extension length (step size)
minCoverLen = 150; Controls: 1) required size of overlap between MCS and NQ 2) minimum coverage for extension (overlap between sample sequences and MCS during NQ construction) 3) NQ must be at least this long
minOverlapLen = 90; minimum required overlap between MCS and New Query Sequence for BL2SEQ, based on 50% of minCoverLen times_used = 13; Maximum number of times that a sequence is used in each direction
Output File(s)
inputFile_AAARF.fasta - main output file with assemblies of high-copy number repeats
formatdb_log - log file generated when inputFile.fasta is formatted for BLAST searching
inputFile.fasta.nhr - BLAST database file generated when inputFile.fasta is formatted for BLAST searching
inputFile.fasta.nin - BLAST database file generated when inputFile.fasta is formatted for BLAST searching
inputFile.fasta.nsq - BLAST database file generated when inputFile.fasta is formatted for BLAST searching
AAARF_log - detailed log of all AAARF activities formatted based on log4Perl perl module
Output File(s)
inputFile_AAARF.fasta - main output file with assemblies of high-copy number repeats
formatdb_log - log file generated when inputFile.fasta is formatted for BLAST searching
inputFile.fasta.nhr - BLAST database file generated when inputFile.fasta is formatted for BLAST searching
inputFile.fasta.nin - BLAST database file generated when inputFile.fasta is formatted for BLAST searching
inputFile.fasta.nsq - BLAST database file generated when inputFile.fasta is formatted for BLAST searching
AAARF_log - detailed log of all AAARF activities formatted based on log4Perl perl module