View on GitHub

Metagenomic-bin-processor

This is a collection of tools that should help with downstream analysis of data obtained from metagenomic binning.

MAGinator

Introduction

This is a collection of tools that should help with downstream analysis of data obtained from metagenomic binnser such as Vamb or MSPMiner. Vamb provides not only the binning clusters of all related contigs but also the bins obtained from individual samples and like many other programs allows a lot of options. Be aware that your question should dictate your workflow but I am providing information about how I run Vamb to allow better reproduction of my workflows. The provided scripts are all made for high-performance computing servers using a TORQUE Resource Manager and more specifically tested/running on Computerome 2.0. The goal is to provide a snakemake worfklow incorporating conda environments for a user-friendly and reproducible workflow.

Binning using Vamb

This example binning workflow was used for fecal samples. It is provided mainly to allow reproduction since VAMB now comes with an excellent snakemake workflow, but can of course be used. Remember to adjust according to your data and hypothesis. You should also check input/output using FastQC at every step to insure optimal quality. For VAMB documentation look at Vambs Github and use their snakemake!

Binning workflow we use

  1. This script removes adapter sequence from raw reads using bbduk.
  2. This script performs trimming of low quality sequence using Sickle.
  3. This script removes host contamination bbmap.
  4. This script does assembly using Spades.
  5. We need to set a lower-cutoff limit for contig size. You can use this script.
  6. This script indexes contigs for Minimap2.
  7. This script maps reads using Minimap2
  8. This script analyses the contig coverage using JGIs jgi_summarize_bam_contig_depths, which is actually part of the Metabat binner.
  9. Now we are ready to bin using this script which is running the GPU-accelerated VAMB.

Postprocessing contig-based bins from VAMB

The binning workflows suggested do not have a lower cut-off for bins. This means that we will have many small bins of which some will be of little use. We want to separate bins into potential MAGs (Metagenome-Assembled Genomes) and extrachromosomal elements. Some bins containing MAGs will also contain viruses and plasmids but are apparently so associated with one specific organism that it makes biological sense to leave them together with their host. However, for analyzing extrachromosomal elements we will of course include these as well. As seen on the figure chromosomes smaller than 200,000bp are the exception thus I have set the cut-off here. You can copy all bins larger than your wanted cut-off using this script. It also provides a visualization of ALL bins and shows your cut-off as a red line.

Analyzing Metagenome-Assembled Genomes

Now that we have separated potential MAGs (bins larger than our cut-off), we would like to do some quality control. CheckM uses lineage-specific marker gene sets to gauge to completness and level of contamination in each bin. CheckM is also included in the VAMB Snakemake workflow.

Assessment of genome quality

Taxonomy of bins

Gene-catalogues of binning clusters

This Snakefile takes the sequences of VAMB clusters and produces a gene count matrix of the nonredundant genes for each cluster (also excluding representative sequences).

The rulegraph of the suggested pipeline is seen as:

Binning clusters signature genes

Binning clusters phylogeny

Binning clusters phylogeny

Abundance of bins

Analyzing Extrachromosomal elements