Select contigs
Select contigs
Community rating: ?????
Retrieve a group of contigs from a fasta file.
Quick Start
- To use Select contigs, import your data in fasta format.
USAGE: select_contigs.pl [-n select_file] [-c contig_prefix] [-d] [-e]
  [-g] [-i] [-l output_line_length] [-m min_size] [-M max_qual]
  [-p] [-q] [-Q qual_value] [-s] [-t type_to_remove] [-u]
  [-v] [-V] [-x] fasta_input_file  fasta_output_file
       or
    select_contigs.pl -h
OPTIONS:
 -c Contig prefix to be added to output contig names.
 -d Input contigs are assumed to be dna. Filter out any degenerate
   contigs that do not contain at least one each of A, C, G, and T.
 -e Emtpy output files are OK and do not result in an error.
 -g Remove runs of Ns from ends of contigs. Minimum contig length
   is enforced after trimming the ends.
 -i Ignore contigs specified in the 'select_file' (-n), and output
   the input contigs whose names are NOT listed. May only be used
   with -n.
 -l Specify output line length.
 -m Specify minimum contig length to be written.
 -M Specify a maximum quality score for all bases in the output
   quality file.
 -n Specifies the name of a 'select_file', which contains a list of
   contig names to be output. Each line of 'select_file' is a tab
   separated list. The fields in the list are: contig_name,
   direction, begin_base, and length. If -i is specified, then
   the list is the list of contigs to be ignored, and only the
   contig_name field is used.
 -p Preserve contig header comments. May not be used with -u.
 -q Also process a fasta quality file ('fasta_input_file'.qual), as
   well as the sequence file and create an output quality file
   ('fasta_output_file'.qual) in addition to the output sequence
   file.
 -Q Specify a constant quality value to be applied to all bases in
   the output quality file or a modifier to be applied to all
   qualities from the input Fasta quality file. If 'qual_value' is
   a simple one- or two-digit positive integer, then that value is
   used for the quality scores and the input Fasta quality file is
   not needed. If 'qual_value' is not just a simple one- or
   two-digit integer, then it specifies a modifier to be applied to
   the values from the input Fasta quality file.
 -s The contig name may be shortened by removing any prefix before
   the word "Contig", i.e., "gono.fasta.screen.Contig26" becomes
   "Contig26".
 -t Specify a filetype to be removed before adding ".qual" to create
   the output quality filename.
 -u Use universal accession numbers (uaccno) as contig names for 454
   reads. May not be used with -p.
 -v Verbose mode - print to STDERR the number of contigs copied. If
   both -v and -V are specified, then -V will be used.
 -V Verbose mode - print out some statistics to STDERR while running.
   If both -v and -V are specified, then -V will be used.
 -x Create new or append to existing (extend) output files.
Test Data
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> select_contigs.
Input File(s)
Use testtranscripts.fasta and namelist.txt from the directory above as test input.
Parameters Used in App
When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.
Enter testtranscripts.fasta in the Input fasta window, and the namelist.txt in the contig list window. Set the option for minimum contig size to 400.
- Output File(s)
Expect a fasta file as output. For the test case, the output file you will find in the example_data directory is named testtranscripts_select.txt.
Tool Source for App
# Written by: James D. White, University of Oklahoma, Advanced Center for
# Â Genome Technology
#
# Date Written: Aug 5, 2009