LASTAL-8.69

Alert:

 

The CyVerse App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01').

Also, as part of the 2.8 app categorization, a number of apps were deprecated and are no longer available, and there is no longer an Archive category. You can search for a suitable replacement in the List of Applications in this window, or search on an app name or tool used for an app in the Apps window search field. If you need an app reinstated, please contact support@cyverse.org.

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org. Thank you.

Rationale and Background

LASTAL (or local alignment search tool aligner) is designed to perform large sequence alignments such as whole genome alignments (WGAs). This particular installation is designed specifically for WGAs, with options specifically enabled to produce high quality alignments.

Inputs

  • Query: Path to the query file name. Nucleotide sequences in fasta format
  • Database: Path to the Database folder prepared by LASTDB

Parameters

  • E-value:Expect value (E) threshold for saving hits (Default is 0.05) - note that this may be slightly different than BLAST e-values. Per the LAST e-values manual pageE is the expected number of alignments with greater or equal score, between: a random sequence with the same length as the query sequence, and a random sequence with the same length as the database. An E of 0.05 is fairly standard for WGAs.
  • Multiplicity: Maximum multiplicity for initial matches. Each initial match is lengthened until it occurs at most this many times in the reference. A larger number is slower but more sensitive. Default is 20, recommended maximum is 100. For more details about LAST parameters, please visit the LAST documentation page here.

Generated output files

LASTAL will generate two .maf files (multiple alignment format). The first (query_name.out.maf) will be the search results that correspond to the unique best fit from the subject genome that corresponds to each part of the query. In essence, each query base pair will be aligned to at most one subject base pair (keeping only the best alignments). This is performed within the app using LAST's last-split algorithm. The second output file (query_name.out2.maf) is generated by swapping the sequences and getting a 1-to-1 alignment by ensuring that only one copy of each query base pair is kept. This is performed using LAST's maf-swap followed by a second round of last-split. These two files can be kept to generate contiguous WGAs.

Test Run

All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > LAST > LASTAL (/iplant/home/shared/iplantcollaborative/example_data/LAST/LASTAL)

Mandatory arguments: 

  • Input fasta file: Fves_PC_genes.fasta
  • Database: Gmax_Chr1_DB

Parameters:

  • e-value: 0.05 (Default)
  • Sensitivity20 (Default)