Data-Driven Discovery of Dynamical Systems in Pharmacology using Large Language Models

Authors: Samuel Holt, Zhaozhi Qian, Tennison Liu, Jim Weatherall, Mihaela van der Schaar

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we present the Data-Driven Discovery (D3) framework, a novel approach leveraging Large Language Models (LLMs) to iteratively discover and refine interpretable models of dynamical systems, demonstrated here with pharmacological applications. Unlike traditional methods, D3 enables the LLM to propose, acquire, and integrate new features, validate, and compare dynamical systems models, uncovering new insights into pharmacokinetics. Experiments on a pharmacokinetic Warfarin dataset reveal that D3 identifies a new plausible model that is well-fitting, highlighting its potential for precision dosing in clinical applications.
Researcher Affiliation Collaboration Samuel Holt University of Cambridge sih31@cam.ac.uk; Zhaozhi Qian Elm UK zqian@elm.sa; Tennison Liu University of Cambridge tl522@cam.ac.uk; James Weatherall Astra Zeneca Mihaela van der Schaar University of Cambridge mv472@cam.ac.uk
Pseudocode Yes F.1 D3 Pseudocode; Algorithm 1 Pseudocode for D3 Framework
Open Source Code Yes Code is available at https://github.com/samholt/Data Driven Discovery and we provide a broader research group code base at https://github.com/vanderschaarlab/Data Driven Discovery.
Open Datasets Yes Finally, we include a real Pharmacokinetic (PK) dataset of Warfarin patients (Warfarin) [Janssen et al., 2022]. Detailed information about all benchmark datasets is provided in Appendix B.
Dataset Splits Yes We split the data into training, validation, and test sets with proportions of 70%, 15%, and 15%, respectively, ensuring that the splits maintain the chronological order to preserve temporal causality.
Hardware Specification Yes All experiments and training were conducted using a single Intel Core i9-12900K CPU @ 3.20GHz, 64GB RAM, and an Nvidia RTX3090 GPU with 24GB of memory.
Software Dependencies No The paper mentions 'PyTorch' but does not specify its version number. Other software or library versions are not mentioned.
Experiment Setup Yes Specifically, we utilize the Adam optimizer [Kingma and Ba, 2014] with a learning rate of 0.01, a batch size of 1,000, and early stopping with a patience of 20. The model is trained for 2,000 epochs to ensure convergence.