Community rating: ?????
Retrieve a group of contigs from a fasta file.
- To use Select contigs, import your data in fasta format.
USAGE: select_contigs.pl [-n select_file] [-c contig_prefix] [-d] [-e]
[-g] [-i] [-l output_line_length] [-m min_size] [-M max_qual]
[-p] [-q] [-Q qual_value] [-s] [-t type_to_remove] [-u]
[-v] [-V] [-x] fasta_input_file fasta_output_file
-c Contig prefix to be added to output contig names.
-d Input contigs are assumed to be dna. Filter out any degenerate
contigs that do not contain at least one each of A, C, G, and T.
-e Emtpy output files are OK and do not result in an error.
-g Remove runs of Ns from ends of contigs. Minimum contig length
is enforced after trimming the ends.
-i Ignore contigs specified in the 'select_file' (-n), and output
the input contigs whose names are NOT listed. May only be used
-l Specify output line length.
-m Specify minimum contig length to be written.
-M Specify a maximum quality score for all bases in the output
-n Specifies the name of a 'select_file', which contains a list of
contig names to be output. Each line of 'select_file' is a tab
separated list. The fields in the list are: contig_name,
direction, begin_base, and length. If -i is specified, then
the list is the list of contigs to be ignored, and only the
contig_name field is used.
-p Preserve contig header comments. May not be used with -u.
-q Also process a fasta quality file ('fasta_input_file'.qual), as
well as the sequence file and create an output quality file
('fasta_output_file'.qual) in addition to the output sequence
-Q Specify a constant quality value to be applied to all bases in
the output quality file or a modifier to be applied to all
qualities from the input Fasta quality file. If 'qual_value' is
a simple one- or two-digit positive integer, then that value is
used for the quality scores and the input Fasta quality file is
not needed. If 'qual_value' is not just a simple one- or
two-digit integer, then it specifies a modifier to be applied to
the values from the input Fasta quality file.
-s The contig name may be shortened by removing any prefix before
the word "Contig", i.e., "gono.fasta.screen.Contig26" becomes
-t Specify a filetype to be removed before adding ".qual" to create
the output quality filename.
-u Use universal accession numbers (uaccno) as contig names for 454
reads. May not be used with -p.
-v Verbose mode - print to STDERR the number of contigs copied. If
both -v and -V are specified, then -V will be used.
-V Verbose mode - print out some statistics to STDERR while running.
If both -v and -V are specified, then -V will be used.
-x Create new or append to existing (extend) output files.
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> select_contigs.
Use testtranscripts.fasta and namelist.txt from the directory above as test input.
Parameters Used in App
When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.
Enter testtranscripts.fasta in the Input fasta window, and the namelist.txt in the contig list window. Set the option for minimum contig size to 400.
Expect a fasta file as output. For the test case, the output file you will find in the example_data directory is named testtranscripts_select.txt.
# Written by: James D. White, University of Oklahoma, Advanced Center for
# Genome Technology
# Date Written: Aug 5, 2009