/
vContact_0.1.49 and vContact-PCs

vContact_0.1.49 and vContact-PCs

vContact 0.1.49 and vContact-PCs

Community rating: ?????

vContact is a tool to perform Guilt-by-contig-association automatic classification of viral contigs.

Complete documentation can be found at either the source tool's original site or at its new location. This app is in a state of on-going, constant development and will
undergo a major revision in the near future. This version will stay and the revised version added as a separate app in the HPC.

Quick Start

  • The quickest way to use vContact is to run vContact-PCs on a BLAST file and provide a contig info file. vContact-PCs will create the appropriate input files for use with vContact.
  • Since this runs at the Texas Advanced Computing Center (TACC) there will be a queue time before the app begins running. Once begun, the time limit is 1 hour, which should be sufficient for datasets of nearly any size.

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> vContact

Input File(s)

There are three required input files. All three need to be in either TSV (tab-separated values) or CSV (comma-separated values) formats. They can be mixed *between files*, but not *within files*.

Protein Clusters Info file: Contains the association of contigs with protein clusters. Must contain the headers "id" and "size"

  • If using the test data, this is the file pc_info.tsv

Contig Info file: Contains contig information. At the very least must have the contig name and the number of proteins associated with the contig. Must contains the header "id" "proteins" and "size"

  • If using the test data, this is the file contigs.tsv

Protein Clusters Profiles file: Contains protein cluster information. Must contain the headers "contid_id" and "pc_id"

  • If using the test data, this is the file pcprofiles.tsv

Parameters Used in App

There are a number of parameters that can be used in the app. Change the defaults only if you know what you're doing.

In general, changing them won't substantially affect the results. For a detailed guide to what each of these options do, please check the documentation.

  • Significativity threshold: Significativity threshold in the contig similarity network
  • Use permissive: Use permissive affiliation (Flag this option to increase the number of contigs retained in the network)
  • Inflation: Inflation parameter to define contig clusters with MCL
  • Module inflation: Inflation parameter to define proteins modules with MCL
  • Module significativity: Significativity threshold in the protein cluster similarity network
  • Module shared min: Minimal number (inclusive) of contigs a PC must appear in to be taken into account in the modules computing
  • Link significitaivity: Significitaivity threshold to link a cluster and a module
  • Link proportion: Proportion of a module's PC a contig must have to be considered as displaying this module

Output File(s)

The output directory created by vContact contains a number of files.

  • cc(*) files are contig cluster files
  • mod(*) are module files
  • (*).clusters are TSV formatted files containing the clusters, 1 cluster per line
  • (*). ntw are TSV formatted *edge-list* files, with source, target and edge weight. These files can be used as input for  a variety of different graph visualization tools.
  • (*).pandas are pandas-formatted tables, generated by the pandas python package

Tool Source for App

This app was created from the project's original source and is now forked at its new location.

 

 

Related content