Select contigs

Select contigs

Community rating: ?????

Retrieve a group of contigs from a fasta file.

Quick Start

  • To use Select contigs, import your data in fasta format.

USAGE: [-n select_file] [-c contig_prefix] [-d] [-e]

    [-g] [-i] [-l output_line_length] [-m min_size] [-M max_qual]

    [-p] [-q] [-Q qual_value] [-s] [-t type_to_remove] [-u]

    [-v] [-V] [-x] fasta_input_file  fasta_output_file

              or -h


  -c  Contig prefix to be added to output contig names.

  -d  Input contigs are assumed to be dna.  Filter out any degenerate

      contigs that do not contain at least one each of A, C, G, and T.

  -e  Emtpy output files are OK and do not result in an error.

  -g  Remove runs of Ns from ends of contigs.  Minimum contig length

      is enforced after trimming the ends.

  -i  Ignore contigs specified in the 'select_file' (-n), and output

      the input contigs whose names are NOT listed.  May only be used

      with -n.

  -l  Specify output line length.

  -m  Specify minimum contig length to be written.

  -M  Specify a maximum quality score for all bases in the output

      quality file.

  -n  Specifies the name of a 'select_file', which contains a list of

      contig names to be output.  Each line of 'select_file' is a tab

      separated list.  The fields in the list are:  contig_name,

      direction, begin_base, and length.  If -i is specified, then

      the list is the list of contigs to be ignored, and only the

      contig_name field is used.

  -p  Preserve contig header comments.  May not be used with -u.

  -q  Also process a fasta quality file ('fasta_input_file'.qual), as

      well as the sequence file and create an output quality file

      ('fasta_output_file'.qual) in addition to the output sequence


  -Q  Specify a constant quality value to be applied to all bases in

      the output quality file or a modifier to be applied to all

      qualities from the input Fasta quality file.  If 'qual_value' is

      a simple one- or two-digit positive integer, then that value is

      used for the quality scores and the input Fasta quality file is

      not needed.  If 'qual_value' is not just a simple one- or

      two-digit integer, then it specifies a modifier to be applied to

      the values from the input Fasta quality file.

  -s  The contig name may be shortened by removing any prefix before

      the word "Contig", i.e., "gono.fasta.screen.Contig26" becomes


  -t  Specify a filetype to be removed before adding ".qual" to create

      the output quality filename.

  -u  Use universal accession numbers (uaccno) as contig names for 454

      reads.  May not be used with -p.

  -v  Verbose mode - print to STDERR the number of contigs copied.  If

      both -v and -V are specified, then -V will be used.

  -V  Verbose mode - print out some statistics to STDERR while running.

      If both -v and -V are specified, then -V will be used.

  -x  Create new or append to existing (extend) output files.

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> select_contigs.

Input File(s)

Use testtranscripts.fasta and namelist.txt from the directory above as test input.

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.

Enter testtranscripts.fasta in the Input fasta window, and the namelist.txt in the contig list window. Set the option for minimum contig size to 400.

  • Output File(s)

Expect a fasta file as output. For the test case, the output file you will find in the example_data directory is named testtranscripts_select.txt.

Tool Source for App

# Written by: James D. White, University of Oklahoma, Advanced Center for

#   Genome Technology


# Date Written: Aug 5, 2009