appendUnclustered

appendUnclustered

Community rating: ?????

A utility to add unclustered sequences to OrthoMCL output.  By default, OrthoMCL excludes unclustered sequences from its output.  This means that any sequences that were not detected to be part of a homolog cluster will not be included.  This app will add these unclustered sequences to OrthoMCL output.  Each unclustered sequence will be represented as an additional 'cluster' of a single sequence.  Note that these do not represent biological homolog clusters, but are useful for determining the number and IDs of species-specific sequences with no homologs for each analyzed species.

Notes: 

Quick Start

Test Data

Input test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> homolog_clustering ->  4_Concatenate_Multiple_Files_output

and

Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 8_OrthoMCL_output -> Nov_14/

Output test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 9_appendUnclustered_output/

Input File(s)

Use Community Data -> iplantcollaborative -> example_data -> homolog_clustering ->  4_Concatenate_Multiple_Files_output -> GG_Combined.txt

and

Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 8_OrthoMCL_output -> Nov_14 ->mcl/

as the test input file and mcl folder.

Parameters Used in App

There are no parameters for this app.

Output File(s) and Folder(s)

  • 'logs' directory: Contains the job submission standard output and standard error files generated by CyVerse systems.  Usually this will only be important for troubleshooting if your job does not run.
  • unclustered_Added directory contains 3 output files
    • appendUnclustered.log records the number of clusters and sequences that the app processed, and the number of sequences, sequences that were clustered, and sequences that were not clustered for each species.
    • orthomcl.index and orthomcl.mclout contain the updated MCL output with the included unclustered sequences.  These files are not meant to be useful at this stage of the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets app.