
ICML 2023 Health & Bio Conference Review
Feeling like you missed out on all the amazing progress at the intersection of Health & Bio presented at ICML 2023 in Honolulu?
Find below a curated, whirlwind review of all the exciting science at the emerging interface of ML/AI, health and biology.
Keynotes
Taking the Pulse Of Ethical ML in Health - Marzyeh Ghassemi
Marzyeh Ghassemi challenges us by demonstrating how easily models derive protected variables from common care data and makes an ever more important case for moving beyond human-generated bias and rather using ML for improving the care system.
The Future of ML in Biology - Jennifer Doudna
Jennifer Doudna outlines how ML could impact biology and gene editing. She introduces us to the unique challenges of using ML in biology, and the next frontier where ML & CRISPR could impact: understanding gene function; with data curation being central.
Selected papers
TabLeak: Tabular Data Leakage in Federated Learning
Federated Learning has been touted as a privacy preserving alternative to direct data access in healthcare settings, but it’s not that simple. Vero et al (ETH Zurich) present TabLeak - an attack that “extracts large subsets of private data at >90% accuracy”.
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander D’Amour (Google DeepMind) identify underspecified ML pipelines - returning multiple distinct predictors with eq test performance - as a critical issue in ML systems deployed in the real-world, that should be addressed and tested for proactively.
Change is Hard: A Closer Look at Subpopulation Shift
Yuzhe Yang et al trained 10’000 models to study subpopulation shift - underperformance in subpopulations - which can be harmful in health applications.
They benchmark how to test & correct for sub-population shift, and find potential for future research.
Self-Interpretable Time Series Prediction with Counterfactual Explanations
In health/bio apps, increasingly counterfactual explanations - what could I have done to obtain a different result? - are sought to understand models.
J Yan (Rutgers University) et al present a new method for creating counterfactual explanations in time series.
Sequential Underspecified Instrument Selection for Cause-Effect Estimation
Causal effect estimation in high dimensional data is an open challenge in bio/drug dev applications
Ailer (Helmholtz Munich) et al study selecting instruments in underspecified settings (number of treatments > number of instruments) for instrumental variable estimation.
Retrosynthetic Planning with Dual Value Networks
A central challenge for ML in chemistry is retrosynthesis planning - finding a synthesis route from commercially available building blocks.
Liu (Microsoft Research) et al introduce a new reinforcement learning planner raising the success rate beyond SOTA.
Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language
Molecule activity predictors are historically trained to predict bioactivity in assays from molecular information.
Philipp Seidl (JKU Linz) et al expand paradigm & include a text representation of assay - demonstrating SOTA 0-shot (no train data) performance.
Multi-Objective GFlowNets
Drug discovery is a multi-objective task - trading off efficacy, safety & others to find optimal drugs.
Moksh Jain (MILA Quebec) et al introduce Multi-objective GFlowNets for generating diverse & optimal solutions with strong performance on drug design tasks.
DiscoBAX: Discovery of optimal intervention sets in genomic experiment design
In biology, a critical task is choosing the experiment to run to maximise likelihood of discovery.
Clare Lyle (OATML) and Arash Mehrjou (GSK) et al introduce DiscoBAX - a BAX-style algorithm to maximise experimental yield in genetic CRISPR experiments
Geometric ML for Molecules
At the excellent Computational Biology workshop, Michael Bronstein (University of Oxford) gave a captivating talk on the enormous potential of geometric ML for molecules, covering applications from molecular impainting, fragment based molecular design and protein function prediction.
A Variational Inference Approach to Single-Cell Gene Regulatory Network Inference using Probabilistic Matrix Factorization
Claudia Skok Gibbs (New York University) et al won the best paper award at the Computational Biology workshop with their stellar study on variational inference for single-cell gene network inference - providing state-of-the-art accuracy as well as well-calibrated uncertainty estimates.
Other events of note
Beyond the core program, Recursion Pharmaceuticals announced Valence Labs - an arm of Recursion dedicated to ML research in drug discovery - and they hosted a discussion on industry/academic collaborations.
Conclusion
Overall, exciting to see …
- continued increase in presence of health/bio, although still only a minor part in the scientific program compared to e.g. general text/image
- causality becoming a stable element of the scientific program
- growing community of ML for bio enthusiasts
DISCLAIMER: The above list is a personal curation that most certainly missed many key contributions (in particular the many excellent posters!) and is only intended to be a starting point for your own exploration.