Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Tallymer-mkindex

Community rating: ?????

Tallymer-mkindex is used for counting and indexing k-mers for a specified value of k (e.g. all 20-mers) in a set of sequences. Requires an enhanced suffix array (ESA) generated using Suffixerator.  Output is a Tallymer index that can be used for searching fasta sequences using the app Tallymer-Search.

...

...

Include Page

...

Test Data

Info

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> Tallymer.

Input File(s)

Specify the directory containing the ESA files.  For example if you use the above example_data directory then the entry will end up being "/iplant/home/shared/iplantcollaborative/example_data/Tallymer/".

Then specify the root name of the ESA.  Using example_data you would enter "maize_BAC100".

Parameters Used in App

  • Use these parameters within the DE app interface:
    • Desired k-mer length: specify a number indicating k-mer length to be indexed
    • Minimum occurance of k-mer to report:  specify a number indicating the minimum number of times a k-mer must be found in the original set of sequences used to generate the Suffixerator ESA, in order to be indexed.  For example if you specify '5' then only k-mers found 5 or more times in the original set of sequences used to generate the Suffixerator ESA will be indexed.
    • Give a name to the index you are creating (optional).  Provide a root name for the index to be generated or the app will generate a name automatically.  In the example we provided the root name "maize_BACS100_20mer_minocc5" to indicate the original source of sequence, the desired k-mer length, and the minimum occurrence count.

Output File(s)

Output will be four files:

maize_BACS100_20mer_minocc5.mbd
maize_BACS100_20mer_minocc5.mct
maize_BACS100_20mer_minocc5.mer
mer20distribution

The first 3 files listed above together constitute the tallymer index.  These files have a common root name and unique 3-letter suffix.

The mer20distribution file is a text file that gives summary information about about distribution of k-mers.

Tool Source for App

...

docs:_DE_archived_apps_blurb
docs:_DE_archived_apps_blurb