Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to upendra@cyverseto support@cyverse.org. Thank you.

Warning
 Though this version of the app works, NCBI recommends the recent version of tbl2asn (gapped)-25.3 app in DE

Rationale and background: 

...

The test data are provided for testing tbl2asn (ungapped)-22.9 in here - /iplant/home/shared/iplantcollaborative/example_data/tbl2asn.sample.data:

Use the following inputs/outputs and parameters for testing tbl2asn (gapped)-22.9

1. All the gaps are of estimated lengths: Every run of 5 or more Ns represents a gap of estimated length, and the linkage evidence is paired-ends:

Info

Note that you should only include an assembly_gap for runs of N's that represent gaps.  Do not add assembly_gaps for single or short runs of N's that represent ambiguous bases. You will need to check your assembly parameters to determine what the N's represent.

  1. Mandatory argument

    1. Template file - template_BP_BS.sbt

    2. Fasta file - sample.gapped.unknown.fsa
    3. Linkage evidence - paired-ends (ie, for paired ends or mate pairs)
    4. Output file - out.gapped.sqn
  2. Optional arguments 
    1. Annotation file - multiple.tbl
    2. Structured comment file - assembly.cmt
  3. Gap details
    1. Estimated Gap length - r5k (Runs of 5 or more N's are estimated gaps and shorter runs of N's are ambiguous bases). 
  4. Parameters
    1. Organism name - [organism=Helicobacter pylori ABC1] [strain=ABC1] [host=Homo sapiens] [isolation-source=blood]
    2. Master Genome Flag - n (default)
    3. Run Discrepency report - checked  (Recommended) 

2. ALL of the gaps are 100bp and are of unknown length: All gaps are 100 Ns and are of unknown length, and the linkage evidence is by alignment to another genome of the same genus:

Info

Note that all of the unknown length gaps must be 100 N's. An assembly_gap will be added for every run of 100 N's.  All other N's will be ignored.  Please contact us for additional instructions if there are unknown length gaps of other sizes. Note that you must know the order and orientation of the contigs.  You cannot randomly link contigs using unknown (or known) length gaps.  If you do not have linkage evidence, submit the sequences as individual contigs.

  1. Mandatory argument

    1. Template file - template_BP_BS.sbt

    2. Fasta file - sample.gapped.known.fsa
    3. Linkage evidence - align-genus
    4. Output file - out.gapped.sqn
  2. Optional arguments 
    1. Annotation file - multiple.tbl
    2. Structured comment file - assembly.cmt
  3. Gap details
    1. Estimated Gap length - r100u (Runs of 5 or more N's are estimated gaps and shorter runs of N's are ambiguous bases). 
  4. Parameters
    1. Organism name - [organism=Helicobacter pylori ABC1] [strain=ABC1] [host=Homo sapiens] [isolation-source=blood]
    2. Master Genome Flag - n (default)
    3. Run Discrepency report - checked  (Recommended) 

3. There are both estimated length and unknown length gaps: Runs of 10 or more N's are estimated gaps, and shorter runs of N's are just ambiguous bases, and all runs of exactly 100 N's are unknown gaps, and the linkage evidence is paired-ends

Info

Note that all of the unknown length gaps must be 100 N's.  The # indicates the size of the minimum number of N's to convert to an estimated length gap. If some run's of 100 N's are unknown length and others are estimated length, please contact us for more information.

  1. Mandatory argument

    1. Template file - template_BP_BS.sbt

    2. Fasta file - sample.gapped.unknown.fsa
    3. Linkage evidence - paired-ends (ie, for paired ends or mate pairs)
    4. Output file - out.gapped.sqn
  2. Optional arguments 
    1. Annotation file - multiple.tbl
    2. Structured comment file - assembly.cmt
  3. Gap details
    1. Estimated Gap length - r10u  
  4. Parameters
    1. Organism name - [organism=Helicobacter pylori ABC1] [strain=ABC1] [host=Homo sapiens] [isolation-source=blood]
    2. Master Genome Flag - n (default)
    3. Run Discrepency report - checked  (Recommended) 

Output Reports:

  1. out.gapped.sqn - sqn file for submission to WGS
  2. multiple.val - varification report
  3. discrep - discrepency report
  4. errorsummary.val - Summary file showing the number, severity and type of errors found in all the .val files.
 

More information about tbl2asn (gapped)-22.9 can be found at http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/ and http://www.ncbi.nlm.nih.gov/genbank/wgs_gapped/

...