MaxBin 2.2 in the Discovery Environment

Alert:

 

The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience.

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to upendra@cyverse.org. Thank you.

 

Rationale and background:

MaxBin-2.2 is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users can understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page.

Users can use MEGAN or similar software on MaxBin bins to find the taxonomy of each bin after the binning process is finished.

Wu YW, Tang YH, Tringe SG, Simmons BA, and Singer SW, "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm", Microbiome, 2:26, 2014.


Pre-Requisites 
  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Mandatory arguments 
    1. Contig file name
    2. Output file name
  3. At least one of the following
    1. Contig abundance files / a list file of all contig abundance files.

    2. Reads file in fasta or fastq format / a list file of all reads file.

  4. Optional arguments
    1. Reassembly (Reassembly option is still highly experimental. To use this function, you need to feed MaxBin "interleaved paired-end" fastq or fasta file if you were to use this option)

    2. Prob_threshold (minimum probability for EM algorithm; default 0.8)

    3. Markerset (By default MaxBin will look for 107 marker genes present in >95% of bacteria. Alternatively you can also choose 40 marker gene sets that are universal among bacteria and archaea (Wu et al., PLoS ONE 2013). This option may be better suited for environment dominated by archaea; however it tend to split genomes into more bins. You can choose between different marker gene sets and see which one works better).


Test/sample data 

The following test data are provided for testing Maxbin-2.2 at /iplant/home/shared/iplantcollaborative/e