The importance, challenges, and future of analytical tools and statistical methods for microbiome data analysis – Microbiome Data Congress 2023 

Validating analytical tools, and statistical methods is necessary to ensure accurate conclusions are drawn from studies. Selecting the most appropriate tool or statistical method is equally important. In a nascent field like microbiome research both can be challenging.

We sat down with a selection of speakers from the upcoming Microbiome Data Congress – 13th-14th November, Boston, to get their perspectives on the session subject “Validating and Contrasting Analytical Tools and Statistical Methods”.

Abraham Gihawi

Research Fellow

University of East Anglia

What is the importance of benchmarking and validating analytical tools for the microbiome field? 

Benchmarking and validating analytical tools is a critical task in bioinformatics generally, but also specifically within the microbiome field. Software manuscripts often include some benchmarking and present their tool as out-performing other tools under a range of circumstances. But they cannot always fully outline the limitations and full scope of use cases. This makes it difficult to ascertain the applicability of a given tool to a unique and potentially niche areas of research. An example of this could be the development of a tool, published with a focus on environmental metagenomics being applied to the diagnosis of infectious diseases in clinical samples.

Software is increasingly provided with sensible default settings, often based on heuristics, to try and assist users and potentiate the application of the tool. But this sometimes means that parameter tweaking is underexplored. The only way to explore a tools functionality is to benchmark various parameter selections against a known truth and critically evaluating the outputs.

 Extensive benchmarking by individual researchers however is not always possible. It can be prohibitively time consuming to create, find or synthesise test datasets with a known ground truth. Let alone to construct scripts and pipelines to test combinations of parameters of different tools. Additionally, it requires a deep knowledge of the tool, which is best provided by the software developers. Community driven benchmarking approaches have shown strength here in that they can be blinded to limit bias, focus on important applications, and allow submissions from software developers as well as other users. Similarly, benchmarking platforms allow users to trial new and emerging tools to evaluate their performance.

 

Presenting: “The Specifics of Pan-Cancer Microbial Structure” at Microbiome Data Congress 2023


Ricardo B. Valladares

PhD – Chief Scientific Officer

Siolta Therapeutics

What do you think are the key challenges in choosing which statistical methods to use when working with microbiome data?

I think choosing statistical methods that result in valid hypothesis testing on microbiome data is crucial for reproducibility of microbiome studies and ultimately the success of our field. Reproducibility is a challenge in microbiome studies, in part, due to upstream factors contributing to batch effects such as diverse extraction methods, sequencing approaches, and normalization techniques. Downstream statistical analyses must also consider the unique characteristics of microbiome data, and those of us working on understanding how the human microbiota impacts health and disease need to be thoughtful when selecting statistical approaches to test our hypotheses.  Microbiome data is unique (compositional, high-dimensional, over-dispersed, etc.) and many conventional statistical approaches may not account for these characteristics. Additionally, many microbiome studies contain small n’s that reduce the power of hypothesis testing. I think the most interesting microbiome questions have a longitudinal component, adding the factors of intra-individual variability and temporal dependencies to consider during analysis.  Luckily there are more and more “out of the box” models and statistical approaches designed specifically to test differential abundance of taxa in the microbiome, associations between microbial taxa and covariates, and microbe-microbe interactions.  Clearly documenting all pipeline parameters, applying appropriate statistical methods, and running and reporting standards (mock communities and true biological samples) will lead to more reproducibility in our field and help us move toward better understanding of causation between the microbiome and human health.


Scott Jackson

Complex Microbial Systems, Group Lead

NIST

Despite numerous analytical tools and statistical methods published for microbiome analysis there is yet to be a single adopted gold standard. What do you think is preventing this, and can we ever reach such a status?

With respect to microbiome measurements, there are a multitude of methodologic variables that can (and do!) impact the result.  Bias is introduced at every step of the measurement process; starting with how you collect and store your sample, to how you analyze and interpret your data, and everything in between.   The reason there is yet to be a single adopted gold standard protocol is because none of our methods work perfectly (they all introduce bias).  Consider, for example, the DNA extraction step for microbiome metagenomic measurements.  There are dozens of DNA extraction methods available (commercial and homemade) and yet none of these methods are 100% efficient at extracting DNA from all cell types (gram-positive, gram-negative, phage, fungi, spores, etc.).  Some methods are better than others, but none are perfect.  Or, consider the choice of bioinformatic analysis tool (a.k.a. the taxonomic profiling tool).  100s of tools have been developed that use different algorithms, statistical approaches, reference databases, etc. and as a result, every tool performs differently. Several benchmarking studies have demonstrated that these tools make trade-offs with respect to sensitivity and specificity.  That is, when a tool performs well with respect to sensitivity, it typically performs poorly with respect to specificity.  And vice-versa.   So, the choice of tool you use should be based on your application (do you want better specificity or sensitivity?).  These same arguments can be made for other microbiome ‘omic measurements.  For example, metabolomic measurements utilize different organic solvents to extract metabolites (e.g., polar vs. non-polar).  Mass spectrometers and NMRs have inherent limitations in the types of molecules they can detect and identify.  Despite these caveats, microbiome measurements can be highly reproducible (precise) when using a single locked-down protocol.  This precision is what allows us to make accurate comparisons (e.g., fold changes) across different sample types and/or cohorts.   For now, the validity of microbiome science is based on our ability to make precise (reproducible) measurements, even in the absence of accuracy.

Presenting: “Standards to Support Innovation in Microbiome Science” at Microbiome Data Congress 2023


David Wood

Head of Bioinformatic Operations Microba

What are some novel analytical tools currently being developed, and how will their use impact the field as a whole?

The field of microbiome research is constantly evolving, and novel analytical tools, especially in bioinformatics, play a pivotal role in advancing our understanding of the microbiome and its impact on human health and the environment. As a relatively new field of study, R&D projects have often been handicapped by the challenges of translating microbiome data into novel treatments but with new analytical and diagnostic tools now emerging, their potential impact on the field, and more importantly human health is significant.

Metagenomics is a powerful technology with broad applications. In 2021 Microba published the Microba Community Profiler (MCP) and demonstrated its’ high precision, recall and outstanding limit of detection. Microba has continued to develop MCP and the Microba Genome Database (MGDB), which now includes over 1.8 million curated microbial and viral genomes. The next iteration of MCP and MGDB achieves the same outstanding accuracy, and includes profiling of prokaryote, eukaryote and viral species from a metagenomic sample. These updated tools routinely identify >95% of metagenomic DNA reads at >99% alignment identity. We have observed that between 1-4% of reads are of viral origin. There is substantial interest in phage profiling in therapeutic development, and we expect these additions to MCP/MGDB will assist to advance this field.

At Microba for instance, we recently launched MetaPanel™, an advanced test for detection of causal agents of gastrointestinal infectious disease. The test employs our leading metagenomics technology to diagnose over 175 targets of clinical interest such as bacteria, fungi, parasites and viruses together with assessment of virulence factors and antimicrobial resistance (AMR) genes to support precise treatment decisions. The test helps clinicians avoid multiple, low-coverage and sequential diagnostic tests that can impact treatment outcomes for vulnerable patients. We are excited about this technology and its’ potential to bring metagenomics into clinical use for infectious disease testing.

Presenting: “Validating Microbiome Secondary Analysis Workflows Beyond Current Standards” at Microbiome Data Congress 2023

Join the discussion in Boston by registering here