[MA 2024 20] Machine learning-based prediction of susceptibility in pathogenic bacteria using whole-genome sequencing data

Amsterdam UMC, Dpt. of Medical Microbiology, Dpt of Medical Informatics
Proposed by: Dr. R.P. Schade, Clinical Microbiologist [r.schade@amsterdamumc.n]

Introduction

Pathogenic Gram-negative bacteria, such as Escherichia coli and Klebsiella pneumoniae, are common causes of healthcare-associated infections, including bloodstream and urinary tract infections. Third-generation cephalosporin antibiotics are vital for treating these infections. Traditional antimicrobial susceptibility testing (AST) methods often require up to 48 hours to produce results, which delays targeted treatment and hampers treatment of patients.


Description of the SRP project

Machine learning (ML) models, combined with whole-genome sequencing (WGS), offer a more rapid and accurate alternative for predicting antimicrobial susceptibility. WGS-based ML models allow the prediction of antibiotic resistance directly from bacterial genomes, facilitating faster and more informed treatment decisions.

Previous studies have successfully applied ML models to predict susceptibility to antibiotics like cefepime in E. coli. The student project aims to extend this approach to third-generation cephalosporins in a broader range of pathogenic Gram-negative bacteria, utilizing both public

genomic data and our own collection of bacterial isolates. Development of an specific machine learning model can lead to a marked reduction in time to detection, and can enable earlier decision-making in patients with infections.


Research questions

- Can machine-learning models accurately predict third-generation cephalosporin susceptibility in pathogenic Gram-negative bacteria using WGS data?

- What genomic features are most predictive of third-generation cephalosporin resistance in these bacteria?

- How does the performance of WGS-based machine-learning models compare to traditional phenotypic testing methods in terms of accuracy and speed?



Methods

Dataset

This project will utilize WGS data from two sources: publicly available genomic datasets and our own collection of Escherichia coli, Klebsiella pneumoniae, and other Gram-negative pathogens. The dataset will include minimum inhibitory concentration (MIC) values for third-generation cephalosporins (e.g., cefotaxime, ceftriaxone, and ceftazidime).


Machine Learning Model Development

The machine learning model will be designed to predict third-generation cephalosporin susceptibility from WGS data. Relevant genomic features, such as the presence of resistance genes (blaCTX-M, blaSHV, blaTEM) and regulatory mutations, will be extracted from the genomic data. An XGBoost-based algorithm will be used to create a binary classification model (susceptible vs. resistant) for each antibiotic.


Model Training and Validation

The dataset will be split into training (80%) and testing (20%) sets. The model will be trained using both public genomic data and our own collection of bacterial isolates, ensuring a diverse representation of Gram-negative pathogens. The model’s performance will be optimized by evaluating metrics such as accuracy, sensitivity, specificity, and error rates.


Performance Evaluation

The model’s predictive performance will be assessed using the test dataset, with EUCAST breakpoints guiding susceptibility classification. Metrics such as accuracy, sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve will be used to measure performance. Detailed error analysis will also be conducted to identify potential biases in the model.


Feature Importance Analysis

To better understand the genomic features driving third-generation cephalosporin resistance, a feature importance analysis will be conducted. This analysis will highlight key resistance genes, mutations, and other relevant factors that contribute to the observed resistance patterns in Gram-negative bacteria.


Expected results

- Develop, train, and validate a machine-learning model to predict third-generation cephalosporin susceptibility in Gram-negative bacteria using WGS data.

- Perform feature importance analysis to identify key resistance genes and regulatory mutations.

- Compare the model’s predictive performance with traditional phenotypic testing methods based on EUCAST breakpoints.

- Prepare a manuscript for publication and present findings to the clinical and microbiology teams.


The project will involve collaboration with various departments at Amsterdam UMC, including Medical Microbiology and Medical Informatics. Interdisciplinary collaboration with bioinformatics experts and clinical teams will provide additional insights into the practical application of machine learning in predicting antimicrobial resistance.


Time period

November – June

May - November


Contact

Interested? Feel free to contact me for questions!

Dr. R.P. Schade, Clinical Microbiologist, Dept Medical Microbiology and Infection Prevention

r.schade@amsterdamumc.nl


References

1. Humphries RM, Bragin E, Parkhill J, Morales G, Schmitz JE, Rhodes PA. Machine-Learning Model for Prediction of Cefepime Susceptibility in Escherichia coli from Whole-Genome Sequencing Data. J Clin Microbiol. 2023;61(3):e01431-22.

2. Spafford K, MacVane S, Humphries RM. Evaluation of empiric beta-lactam susceptibility prediction among Enterobacteriaceae by molecular beta-lactamase gene testing. J Clin Microbiol. 2019;57:e00674-19.

3. Doyle RM, O'Sullivan DM, Aller SD, Bruchmann S, et al. Discordant bioinformatic predictions of antimicrobial resistance from whole-genome sequencing data of bacterial isolates: an inter-laboratory study. Microb Genom. 2020;6:e000335.

4. Jaillard M, Lima L, Tournoud M, Mahe P, et al. A fast and agnostic method for bacterial genome-wide association studies: bridging the gap between k-mers and genetic events. PLoS Genet. 2018;14:e1007758.

5. Nguyen M, Long SW, McDermott PF, Olsen RJ, et al. Using machine learning to predict antimicrobial MICs and associated genomic features for nontyphoidal Salmonella. J Clin Microbiol. 2019;57:e01260-18.

6. Alcock BP, Raphenya AR, Lau TTY, et al. CARD 2020: antibiotic resistome surveillance with the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res. 2020;48:D517-D525.