CD-HIT-est 4.6.8

Performs clustering of contigs on a fasta file of assembled transcripts.


CD-HIT-EST clusters a nucleotide dataset into clusters that meet a user-defined similarity threshold, usually a sequence identity.

Quick Start

Test Data


Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> CD-HIT.

Input File(s)

Use testranscripts.fasta from the directory above as test input.

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.

    • Global sequence identity should be set to 0.94.
    • Default settings otherwise.

Output File(s)

Expect CD-HITout.fa and  CD-HITout.fa.clstr as output. 

CD-HITout.fa contains the clustered sequence in fasta format.

CD-HITout.fa.clstr contains information about the clusters.

Tool Source for App