AAARF v 1.0.1

AAARF v 1.0.1

 Community rating: ?????

Assembles pseudomolecules representing high-copy number repeats from genomic sample sequence data. AAARF is intended for use with small, random, sample sequence datasets, not large NGS datasets.

Notes:

Quick Start

  • To use AAARF you will need a fasta file of genomic sample sequences, preferably sorted in order of sample repetitiveness.

Example Data

Input test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> aaarf -> input -> ZU_1000.fasta


Output test data for this app appears directly in the Discovery Environment in the Data window under:
Community Data -> iplantcollaborative -> example_data -> aaarf -> output

Input File(s)

Fasta file of sample sequences that you are analyzing. Input sequences should be vector trimmed if appropriate. There should be no empty sequences in the input fasta file. All input sequences should be at least 20 nts in size (20 is arbitrary here. You just want the sequences to be larger than the BLAST wordsize). Also, sequences (except for fasta headers) should only contain nucleotide designations (actg). Optional - for best results, it is suggested that you perform an All-by-All BLAST of your sample sequence and order the input fasta file from most to least repetitive prior to running AAARF.

For testing, use 

Community Data -> iplantcollaborative -> example_data -> aaarf -> input -> ZU_1000.fasta

Parameters Used in App

BLAST AND MCS CONSTRUCTION PARAMETERS (shown with default values)

maxBlastHits = 100 maximum number of hits used to construct coverage matrix

minBlastMatch = 150; minimum length for a BLAST hit

minBlastIdentity = 0.89; minimum identity for a BLAST hit

minBlastCoverDepth = 2; minimum coverage depth for MCS

minBlastConsenLen = 150; minimum length for MCS

BLAST_e = 1e-25; maximum evalue for BLAST hit

BL2SEQ_e = 1e-10; maximum evalue for BL2SEQ hit


EXTENSION PARAMETERS (shown with default values)

maxExtendHits = 1000; maximum number of BLAST hits used to extend build

minExtendHits = 1; minimum number of BLAST hits used to extend build, must be greater than or equal to = 1

maxExtendLen = 50; maximum extension length (step size)

minExtendLen = 0; minimum extension length (step size)

minCoverLen = 150; Controls: 1) required size of overlap between MCS and NQ 2) minimum coverage for extension (overlap between sample sequences and MCS during NQ construction) 3) NQ must be at least this long

minOverlapLen = 90; minimum required overlap between MCS and New Query Sequence for BL2SEQ, based on 50% of minCoverLen times_used = 13; Maximum number of times that a sequence is used in each direction

Output File(s)

inputFile_AAARF.fasta - main output file with assemblies of high-copy number repeats

formatdb_log - log file generated when inputFile.fasta is formatted for BLAST searching

inputFile.fasta.nhr - BLAST database file generated when inputFile.fasta is formatted for BLAST searching

inputFile.fasta.nin - BLAST database file generated when inputFile.fasta is formatted for BLAST searching

inputFile.fasta.nsq - BLAST database file generated when inputFile.fasta is formatted for BLAST searching

AAARF_log - detailed log of all AAARF activities formatted based on log4Perl perl module


Output File(s)

inputFile_AAARF.fasta - main output file with assemblies of high-copy number repeats

formatdb_log - log file generated when inputFile.fasta is formatted for BLAST searching

inputFile.fasta.nhr - BLAST database file generated when inputFile.fasta is formatted for BLAST searching

inputFile.fasta.nin - BLAST database file generated when inputFile.fasta is formatted for BLAST searching

inputFile.fasta.nsq - BLAST database file generated when inputFile.fasta is formatted for BLAST searching

AAARF_log - detailed log of all AAARF activities formatted based on log4Perl perl module