Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data-Driven Discovery of Dynamical Systems in Pharmacology using Large Language Models

Authors: Samuel Holt, Zhaozhi Qian, Tennison Liu, Jim Weatherall, Mihaela van der Schaar

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we present the Data-Driven Discovery (D3) framework, a novel approach leveraging Large Language Models (LLMs) to iteratively discover and refine interpretable models of dynamical systems, demonstrated here with pharmacological applications. Unlike traditional methods, D3 enables the LLM to propose, acquire, and integrate new features, validate, and compare dynamical systems models, uncovering new insights into pharmacokinetics. Experiments on a pharmacokinetic Warfarin dataset reveal that D3 identifies a new plausible model that is well-fitting, highlighting its potential for precision dosing in clinical applications.
Researcher Affiliation Collaboration Samuel Holt University of Cambridge EMAIL; Zhaozhi Qian Elm UK EMAIL; Tennison Liu University of Cambridge EMAIL; James Weatherall Astra Zeneca Mihaela van der Schaar University of Cambridge EMAIL
Pseudocode Yes F.1 D3 Pseudocode; Algorithm 1 Pseudocode for D3 Framework
Open Source Code Yes Code is available at https://github.com/samholt/Data Driven Discovery and we provide a broader research group code base at https://github.com/vanderschaarlab/Data Driven Discovery.
Open Datasets Yes Finally, we include a real Pharmacokinetic (PK) dataset of Warfarin patients (Warfarin) [Janssen et al., 2022]. Detailed information about all benchmark datasets is provided in Appendix B.
Dataset Splits Yes We split the data into training, validation, and test sets with proportions of 70%, 15%, and 15%, respectively, ensuring that the splits maintain the chronological order to preserve temporal causality.
Hardware Specification Yes All experiments and training were conducted using a single Intel Core i9-12900K CPU @ 3.20GHz, 64GB RAM, and an Nvidia RTX3090 GPU with 24GB of memory.
Software Dependencies No The paper mentions 'PyTorch' but does not specify its version number. Other software or library versions are not mentioned.
Experiment Setup Yes Specifically, we utilize the Adam optimizer [Kingma and Ba, 2014] with a learning rate of 0.01, a batch size of 1,000, and early stopping with a patience of 20. The model is trained for 2,000 epochs to ensure convergence.