Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • 'logs' directory: Contains the job submission standard output and standard error files generated by iPlant systemsCyVerse systems.  Usually this will only be important for troubleshooting if your job does not run.
  • cluster_Report directory contains 1 log file and multiple species-centric files
    • 0_clusterReport.log contains 2 tables.  Each uses the 2-letter abbreviation for each species created in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow (Tax column).  The table headers are: 
      • Tax=2-letter abbreviation for each species
      • nSeq = number of sequences
      • nGroup = number of clusters
      • +(UPre)= number of clusters where sequences from this species are uniquely present
      • -(UAbs)=number of clusters where sequences from this species are uniquely absent 
      • The first table shows the number of sequences input into OrthoMCL for each species (nSeq Column), the number of clusters produced by OrthoMCL for each species (nGroup column), the number of clusters that contain sequences only from that species (+UPre) column), and the number of clusters that contain sequences from all other species (-(UAbs) column). +(Upre) and -(UAbs) can be thought of as clusters where sequences from that species are either 'Uniquely Present' or 'Uniquely Absent'.
      • The second table contains the number of clusters that contain sequences from each pairwise combination of species.  Tax=2-letter abbreviation for each species.  Note that for the with unclustered added scenario, the number of +(Upre) clusters can rise significantly because each unclustered sequence is added as a single-sequence cluster.
    • species-specific files:  The remaining files are named for the input species.  This explanation used files in Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 10_clusterReport_output -> without_unclustered_added  as an example.  Files are named using the 2-letter abbreviation for each species created in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.  'Shared' files contain all OrthoMCL-detected clusters that contain sequences for those two species, regardless of the copy number in other species.  Individual species files are either + or - . +files contain all OrthoMCL-detected clusters that contain sequences from only that species.  These are the 'Uniquely Present' clusters from the first table in 0_clusterReport.log. - files contain all OrthoMCL-detected clusters that contain sequences from every examined species except that species.  These are the 'Uniquely Absent' clusters from the first table in 0_clusterReport.log
    • Each  species-specific file, contains one cluster per line.  Each line follows the format: ortholog cluster#(#species in cluster:#sequences in cluster, comma delimited list of the # of sequences per species) tab delimited list of protein-encoding gene ids (each followed by 2-letter abbreviation).  For example, consider the file Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 10_clusterReport_output -> without_unclustered_added -> Shared_NC_TG.group. Line 3 shows OrthoMCL cluster ID 1000.  This cluster contains sequences from 4 species and 4 sequences, 1 per species.  Their are 4 sequence IDs:  1000(4:4,NC:1,PF:1,TA:1,TG:1) NC03976(NC) PF04397(PF) TA00754(TA) TG00634(TG).  
    • Notes
      • Remember that full OrthoMCL output was made by the OrthoMCL v1.4 app.  All '.group' files generated here are derived directly form the all_orthomcl.out file.  See Community Data -> iplantcollaborative -> example_data -> homolog_clustering -> 8_OrthoMCL_output -> Nov14 -> all_orthomcl.out for an example.  This should be considered part of the overall output.
      • OrthoMCL cluster IDs are arbitrary.
      • If there is more than one sequence ID from a species in the same cluster, this indicates that paralogs were detected.
      • In the with unclustered added scenario, the added single-sequence clusters will be counted as Uniquely Present in the 0_clusterReport.log and (+) files.