Feeling like you missed out on all the amazing progress at the intersection of Health & Bio presented at ICML 2023 in Honolulu?

Find below a curated, whirlwind review of all the exciting science at the emerging interface of ML/AI, health and biology.

Keynotes

Taking the Pulse Of Ethical ML in Health - Marzyeh Ghassemi

Marzyeh Ghassemi challenges us by demonstrating how easily models derive protected variables from common care data and makes an ever more important case for moving beyond human-generated bias and rather using ML for improving the care system.

The Future of ML in Biology - Jennifer Doudna

Jennifer Doudna outlines how ML could impact biology and gene editing. She introduces us to the unique challenges of using ML in biology, and the next frontier where ML & CRISPR could impact: understanding gene function; with data curation being central.

Selected papers

TabLeak: Tabular Data Leakage in Federated Learning

Federated Learning has been touted as a privacy preserving alternative to direct data access in healthcare settings, but it’s not that simple. Vero et al (ETH Zurich) present TabLeak - an attack that “extracts large subsets of private data at >90% accuracy”.

[Paper link]

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Alexander D’Amour (Google DeepMind) identify underspecified ML pipelines - returning multiple distinct predictors with eq test performance - as a critical issue in ML systems deployed in the real-world, that should be addressed and tested for proactively.

[Paper link]

Change is Hard: A Closer Look at Subpopulation Shift

Yuzhe Yang et al trained 10’000 models to study subpopulation shift - underperformance in subpopulations - which can be harmful in health applications.

They benchmark how to test & correct for sub-population shift, and find potential for future research.

[Paper link]

Self-Interpretable Time Series Prediction with Counterfactual Explanations

In health/bio apps, increasingly counterfactual explanations - what could I have done to obtain a different result? - are sought to understand models.

J Yan (Rutgers University) et al present a new method for creating counterfactual explanations in time series.

[Paper link]

Sequential Underspecified Instrument Selection for Cause-Effect Estimation

Causal effect estimation in high dimensional data is an open challenge in bio/drug dev applications

Ailer (Helmholtz Munich) et al study selecting instruments in underspecified settings (number of treatments > number of instruments) for instrumental variable estimation.

[Paper link]

Retrosynthetic Planning with Dual Value Networks

A central challenge for ML in chemistry is retrosynthesis planning - finding a synthesis route from commercially available building blocks.

Liu (Microsoft Research) et al introduce a new reinforcement learning planner raising the success rate beyond SOTA.

[Paper link]

Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language

Molecule activity predictors are historically trained to predict bioactivity in assays from molecular information.

Philipp Seidl (JKU Linz) et al expand paradigm & include a text representation of assay - demonstrating SOTA 0-shot (no train data) performance.

[Paper link]

Multi-Objective GFlowNets

Drug discovery is a multi-objective task - trading off efficacy, safety & others to find optimal drugs.

Moksh Jain (MILA Quebec) et al introduce Multi-objective GFlowNets for generating diverse & optimal solutions with strong performance on drug design tasks.

[Paper link]

DiscoBAX: Discovery of optimal intervention sets in genomic experiment design

In biology, a critical task is choosing the experiment to run to maximise likelihood of discovery.

Clare Lyle (OATML) and Arash Mehrjou (GSK) et al introduce DiscoBAX - a BAX-style algorithm to maximise experimental yield in genetic CRISPR experiments

[Paper link]

Geometric ML for Molecules

At the excellent Computational Biology workshop, Michael Bronstein (University of Oxford) gave a captivating talk on the enormous potential of geometric ML for molecules, covering applications from molecular impainting, fragment based molecular design and protein function prediction.

A Variational Inference Approach to Single-Cell Gene Regulatory Network Inference using Probabilistic Matrix Factorization

Claudia Skok Gibbs (New York University) et al won the best paper award at the Computational Biology workshop with their stellar study on variational inference for single-cell gene network inference - providing state-of-the-art accuracy as well as well-calibrated uncertainty estimates.

[Paper link]

Other events of note

Beyond the core program, Recursion Pharmaceuticals announced Valence Labs - an arm of Recursion dedicated to ML research in drug discovery - and they hosted a discussion on industry/academic collaborations.

[Video link]

Conclusion

Overall, exciting to see …

  • continued increase in presence of health/bio, although still only a minor part in the scientific program compared to e.g. general text/image
  • causality becoming a stable element of the scientific program
  • growing community of ML for bio enthusiasts


DISCLAIMER: The above list is a personal curation that most certainly missed many key contributions (in particular the many excellent posters!) and is only intended to be a starting point for your own exploration.