vContact_0.1.49 and vContact-PCs

vContact 0.1.49 and vContact-PCs

Community rating: ?????

vContact is a tool to perform Guilt-by-contig-association automatic classification of viral contigs.

Complete documentation can be found at either the source tool's original site or at its new location. This app is in a state of on-going, constant development and will
undergo a major revision in the near future. This version will stay and the revised version added as a separate app in the HPC.

Quick Start

  • The quickest way to use vContact is to run vContact-PCs on a BLAST file and provide a contig info file. vContact-PCs will create the appropriate input files for use with vContact.
  • Since this runs at the Texas Advanced Computing Center (TACC) there will be a queue time before the app begins running. Once begun, the time limit is 1 hour, which should be sufficient for datasets of nearly any size.

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> vContact

Input File(s)

There are three required input files. All three need to be in either TSV (tab-separated values) or CSV (comma-separated values) formats. They can be mixed *between files*, but not *within files*.

Protein Clusters Info file: Contains the association of contigs with protein clusters. Must contain the headers "id" and "size"

  • If using the test data, this is the file pc_info.tsv

Contig Info file: Contains contig information. At the very least must have the contig name and the number of proteins associated with the contig. Must contains the header "id" "proteins" and "size"

  • If using the test data, this is the file contigs.tsv

Protein Clusters Profiles file: Contains protein cluster information. Must contain the headers "contid_id" and "pc_id"

  • If using the test data, this is the file pcprofiles.tsv

Parameters Used in App

There are a number of parameters that can be used in the app. Change the defaults only if you know what you're doing.

In general, changing them won't substantially affect the results. For a detailed guide to what each of these options do, please check the documentation.

  • Significativity threshold: Significativity threshold in the contig similarity network
  • Use permissive: Use permissive affiliation (Flag this option to increase the number of contigs retained in the network)
  • Inflation: Inflation parameter to define contig clusters with MCL
  • Module inflation: Inflation parameter to define proteins modules with MCL
  • Module significativity: Significativity threshold in the protein cluster similarity network
  • Module shared min: Minimal number (inclusive) of contigs a PC must appear in to be taken into account in the modules computing
  • Link significitaivity: Significitaivity threshold to link a cluster and a module
  • Link proportion: Proportion of a module's PC a contig must have to be considered as displaying this module

Output File(s)

The output directory created by vContact contains a number of files.

  • cc(*) files are contig cluster files
  • mod(*) are module files
  • (*).clusters are TSV formatted files containing the clusters, 1 cluster per line
  • (*). ntw are TSV formatted *edge-list* files, with source, target and edge weight. These files can be used as input for  a variety of different graph visualization tools.
  • (*).pandas are pandas-formatted tables, generated by the pandas python package

Tool Source for App

This app was created from the project's original source and is now forked at its new location.