RepeatModeler

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to kchougul@cshl.edu. Thank you.

Rationale and background:

RepeatModeler

is a de-novo repeat family identification and modeling package.At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.

Version: 1.0.11

Pre-Requisites

A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
Mandatory arguments -
1. sequence fasta file: (in fasta format)-sequence database containing the genomic sequence

Test/sample data

The following test data are provided for testing Repeatmodeler in here - /iplant/home/shared/iplantcollaborative/example_data/repeatmodeler:

test.fasta: sequence fasta file

Run Repeatmodeler on test.fasta file.

Results

Successful execution of the Repeatmodeler will contain several files and directories. The raw output is directed to a working directory named RM_. ie. "RM_5098.MonMar141305172005" and remains after each run for debugging purposes. At the completion of the run two files are generated:

-families.fa : Consensus sequences

-families.stk : Seed alignments

Warning

This app is running with 4 CPU with node. So any inputsequncefile > 300Mb would take 5-6days to complete. Furtherdevelopment to scale the app will be aavalible soon.

More information on the tool can be found here - http://www.repeatmasker.org/RepeatModeler/