MaxBin 2.2 in the Discovery Environment
| The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience. |
The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Rationale and background:
MaxBin-2.2 is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users can understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page.
Users can use MEGAN or similar software on MaxBin bins to find the taxonomy of each bin after the binning process is finished.
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
- Mandatory arguments
- Contig file name
- Output file name
- At least one of the following
Contig abundance files / a list file of all contig abundance files.
Reads file in fasta or fastq format / a list file of all reads file.
- Optional arguments
Reassembly (Reassembly option is still highly experimental. To use this function, you need to feed MaxBin "interleaved paired-end" fastq or fasta file if you were to use this option)
Prob_threshold (minimum probability for EM algorithm; default 0.8)
Markerset (By default MaxBin will look for 107 marker genes present in >95% of bacteria. Alternatively you can also choose 40 marker gene sets that are universal among bacteria and archaea (Wu et al., PLoS ONE 2013). This option may be better suited for environment dominated by archaea; however it tend to split genomes into more bins. You can choose between different marker gene sets and see which one works better).
The following test data are provided for testing Maxbin-2.2 at /iplant/home/shared/iplantcollaborative/example_data/Maxbin.sample.data
- Contigs file (20x.scaffold)
- Abundance file (20x.abund)
- reads file (20x.reads)
Results
Assume your output file header is (out). MaxBin will generate information using this file header as follows.
- (out).0XX.fasta -- the XX bin. XX are numbers, e.g. out.001.fasta
- (out).summary -- a summary file describing which contigs are being classified into which bin.
- (out).log -- a log file recording the core steps of MaxBin algorithm
- (out).marker -- marker gene presence numbers for each bin. This table is ready to be plotted by R or other 3rd-party software.
- (out).marker.pdf -- visualization of the marker gene presence numbers using R. Will only appear if -plotmarker is specified.
- (out).noclass -- this file stores all sequences that pass the minimum length threshold but are not classified successfully.
- (out).tooshort -- this file stores all sequences that do not meet the minimum length threshold.
- (out).marker_of_each_gene.tar.gz -- this tarball file stores all markers predicted from the individual bins. Use "tar -zxvf (out).marker_of_each_gene.tar.gz" to extract the markers [(out).0XX.marker.fasta].
(if -reassembly is given) (out)_reassem/(out).reads.0xx -- the collected reads for the 0xx bin. (out)_reassem/(out).reads.noclass - reads that cannot be assigned to any bin.