Antibiotic Resistance Gene Detection on the Global Microbial Gene Catalog v1.0 ( GMGC ) dataset

This dataset supports the manuscript The elusive resistome: a global comparison reveals large discrepancies among detection pipelines.

The preprint is available on BioRxiv . In addition to this, a full description of the datasets and access to them are available in Zenodo .


Unigenes and metagenomes

  • ARG predictions were done on 278,788,551 unigenes
  • The abundance and richness of the ARGs was estimated on 11,519 metagenomic samples
  • The pan- and core-resistomes were estimated across 13 different habitats represented by the metagenomic samples

Detection Pipelines


Ontology harmonization

  • Outputs from DeepARG, AMRFinderPlus, ABRicate, and ResFinder were standardized using argNorm v1.0.0 .
Pipeline
Number of ARGs

This section highlights the difference in number of Antimicrobial Resistance Genes (ARGs) reported by each pipeline.

A total of 178,107 unigenes from GMGCv1 were reported as ARG by at least one pipeline. The largest difference, 45-fold, was observed between ABRicate-ResFinder and DeepARG.

Loading...
Download Table
Jaccard Index

This is the Pairwise Jaccard index showing the mean overlap in ARGs detected between pipelines.

Higher values indicate greater agreement between two pipelines in the set of genes they identify as ARGs.

Loading...
Download Table
ARG Class Proportion

A heatmap illustrating the proportion of ARG classes as reported by each pipeline. Note that for this heatmap, only classes representing 5% of the pipeline's total have a label.

This plot reveals how the proportional makeup of gene classes shifts depending on the chosen pipeline.

Loading...
Download Table
Relative abundance per Sample

We show the distribution of the relative abundance of ARGs detected by each pipeline across habitats. The middle line denotes the median while each box limits represent the interquartile range and the whiskers extend to 1.5×IQR beyond the first and third quartiles.

Loading...
Download Table
Relative Abundance per Gene Class

We show the distribution of the relative abundance of ARG classes detected by each pipeline across habitats. The middle line denotes the median while each box limits represent the interquartile range. ARG class abbreviations are found in the supplementary tables.

Loading...
Download Table
Richness

This plot shows the Richness of ARGs detected by each pipeline per sample across habitats.

Loading...
Download Table
Number of genes

This section shows the size of the Pan- and Core resistomes across different habitats.

For each pipeline and habitat, the core-resistome was estimated by randomly selecting 500 subsamples of 100 metagenomic samples. For each subsample, we recorded:

  1. the subsample core, i.e., the gene class of the centroids with a detection value ≥1 in at least p=50% of samples, grouped by habitat,
  2. the total number of ARGs with a detection value ≥1 in any sample, as a measure of richness.

The pan-resistome size for each habitat and ARG class was calculated as the mean richness across the 500 subsamples. The core-resistome corresponded to ARGs present in ≥n=90% of the subsample cores.

Alternative values for p and n can be selected in the menu on the left.

Loading...
Download Pan-resistome Table Download Core-resistome Table
Class-Specific Coverage (CSC) by Gene Class

For a given ARG class, the CSC of pipeline A (reference pipeline) with respect to pipeline B is the proportion of ARGs reported by pipeline B that were also reported by pipeline A.

While analogous to recall, we intentionally use the term 'coverage' to avoid implying that any single pipeline constitutes a ground truth.

Here, we plot the distribution of CSC of a reference pipeline and gene class when compared to n more pipelines.

A distribution closer to 100% indicates that the reference pipeline reports most of the ARGs of that class that other pipelines report.

Loading...
Download Table
Metagenomic samples per habitat

Overview of all metagenomic samples and their associated habitats.

Download Table
ARG Classes and Abbreviations

ARO term IDs, gene class names, and abbreviations used throughout the app.

Download Table
ARGs Identified per Pipeline

Full list of ARGs (unigenes) identified by each pipeline, with their associated gene class.

Loading...
Download Table