Introduction

This shiny app shows the main and supplementary figures for the manuscript How ARG Detection Pipelines Shape Our View of the Resistome and allows to control different parameters.

302,655,267 unigenes

The unigenes are representative sequences after clustering 2.3 billion bacterial genes at 95% identity. The unigenes come from GMGC and are accessible here.

13,174 metagenomic samples from 16 different habitats

The metagenomic samples come from GMGC and are available here . In this app, we did not consider the habitats amplicon, isolate, and built-environment. The abundance of each gene in the metagenomes can be accessed here . The summary of metagenomic samples per habitat can be found in Table S2 in the.

ARG classes

Ontology normalization was done with argNorm . Gene classes were manually curated after. The gene classes used in this project can be found in Table S3 in the paper.

Pipelines

We used full-sized gene in all pipelines. For each pipeline, we chose a single parameter:
  • Nucleotide sequences through:
    • ResFinder, and
    • ABRicate with the databases: CARD, ResFinder, NCBI, ARG-ANNOT, and MEGARES 2.0.;
  • Amino acid sequences through:
    • DeepARG
    • fARGene
    • RGI with DIAMOND aligner
    • AMRFinderPlus.
Pipeline

Number of ARGs

This section highlights how different computational pipelines influence the detection of Antimicrobial Resistance Genes (ARGs) from the unigene dataset.

A total of 178,107 unigenes from GMGCv1 were reported as antibiotic resistance genes (ARG) by at least one pipeline. The total ARG count varies across pipelines; for example, in the case of ABRicate-ResFinder and DeepARG, there is about a 45-fold difference in the count.

Loading...
ARG Class Proportion

A heatmap illustrating the proportion of ARG classes as reported by each pipeline. Also note that for this heatmap, only classes representing 5% of the total within at least one pipeline is visualised. Despite the massive differences in absolute counts seen on the left, this plot reveals how the proportional makeup of gene classes shifts depending on the pipeline chosen.

From the source file, we merged MFS efflux pumps with all other efflux pumps.

Loading...

Relative abundance per Sample

This section explores the Relative Abundance of Antimicrobial Resistance Genes (ARGs) across different host habitats, highlighting how the choice of pipeline impacts the estimation of the quantity of ARGs.

This displays the total relative abundance of ARGs detected by each pipeline per sample across various habitats. For interpreting the boxplot, it would be helpful to note that the center line denotes the median while each box limits is the interquartile range (IQR) and the whiskers extent to 1.5× IQR beyond the first and third quartiles.

Loading...
Relative Abundance per Gene Class

This plots illustrates the relative abundance by gene classes based on their specific resistance mechanisms in different habitats. For the gene classes, the groups 'class A' and 'tet RPG' represent Class A β-lactamases and tetracycline ribosomal protection genes, respectively.

From the source file, we merged MFS efflux pumps with all other efflux pumps.

Loading...

Number of genes

This section shows the Pan- and Core resistomes across different host habitats for each pipeline.

  • Pan-resistome (Left, Panel a): This represents the total pool of unique ARGs found across all samples within a given habitat by the pipelines. It is also shown that these numbers vary depending on the pipelines use. It also represents the diversity of resistance genes in the chosen habitat.

  • Core-resistome (Right, Panel b): This represents the ARGs that are persistently found across almost all samples within a habitat. Pipeline-specific detection influences which genes are considered ubiquitous.

Loading...

Class-Specific Coverage by Pipeline

This tab examines Class-Specific Coverage (CSC), showing the degree of overlap or agreement between different detection pipelines. In simple terms, this is it: For a given ARG class g, the CSC of pipeline A with respect to pipeline B represents the proportion of ARGs identified by pipeline B that were also captured by pipeline A.

This is the overall percentage of ARGs detected by a reference pipeline that were also successfully identified by the compared pipeline. A higher percentage indicates strong agreement between the pipelines.

Loading...
Class-Specific Coverage (CSC) by Gene Class

This plot breaks down this overlap by gene classes based on specific resistance mechanisms. This reveals whether two pipelines might agree perfectly on certain gene classes (like tetracycline resistance) but completely miss each other on others.

Loading...