Expert-level protocol translation for self-driving labs

Authors: Yu-Zhe Shi, Fanxu Meng, Haofei Hou, Zhangqian Bi, Qiao Xu, Lecheng Ruan, Qining Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Quantitative and qualitative evaluations have demonstrated its performance at par with that of human experts, underscoring its potential to significantly expedite and democratize the process of scientific discovery by elevating the automation capabilities within self-driving laboratories.
Researcher Affiliation Academia Yu-Zhe Shi , Fanxu Meng , Haofei Hou , Zhangqian Bi, Qiao Xu, Lecheng Ruan , Qining Wang Department of Advanced Manufacturing and Robotics, College of Engineering, Peking University Equal contribution ruanlecheng@ucla.edu, qiningwang@pku.edu.cn
Pseudocode Yes Algorithm 1 Reagent flow analysis procedure TRANSITION(M, o) Ź Context Transition ERASE(MpΓq, KILLSp MpΓq, oq) APPEND(MpΓq, DEFINESpoq) Ź State Transition Mpqq Ð p Mpqq z oq Y NEXTOPSpoq procedure FLOW(ppcq, M) R t u Ź Set of reagent dependence Mpqq Ð to1u Ź Initial State MpΓq Ð x y Ź Initial Memory while Mpqq ϕ do TRANSITION(M, Mpqq)
Open Source Code No The project page with supplementary files for reproducing the results of this paper will be available at https://autodsl.org/procedure/papers/neurips24shi.html. We will also release our codes upon acceptance.
Open Datasets Yes The real experiments for the testing set are retrieved from open-sourced websites run by top-tier publishers, including Nature s Protocolexchange2, Cell s Star-protocols3, Bio-protocol4, Wiley s Current Protocols5, and Jove6. (Footnotes 2-6 provide specific URLs: https://protocolexchange.researchsquare.com/, https://star-protocols.cell.com/, https://bio-protocol.org/en, https://currentprotocols.onlinelibrary.wiley.com/, https://www.jove.com/)
Dataset Splits No The paper states: "We select 75 complicated experiments with 1,166 steps in total as the testing set". While it implicitly suggests a training/validation phase for the models, it does not explicitly provide details about specific validation dataset splits (percentages, counts, or methodologies).
Hardware Specification No The paper mentions using OpenAI models (gpt-3.5-turbo-0125 and text-embedding-ada-002) for experiments but does not specify the underlying hardware (e.g., CPU, GPU type, memory) on which these models or the rest of the framework were run.
Software Dependencies Yes We employ the Spa Cy Dependency Parser to analyze the syntactic structure of protocol c... We selected Open AI s gpt-3.5-turbo-0125 model for our experiments. Additionally, we utilized Open AI s text-embedding-ada-002 model to embed the training dataset and build a vector database.
Experiment Setup Yes We employ state-of-the-art LLMs to extract reagent entities from NL-based protocol descriptions for the two utilities KILLS and DEFINES through instruction-following in-context learning (Wei et al., 2021; Brown et al., 2020) (refer to Appx. C.2 for details). The prompt for NER is as follows. The following prompt is used to analyze whether the input of the current operation is the output of a previous operation. Additionally, this prompt is used to determine if reagents in the current memory MpΓq are killed by the current operation. The cost model charged US$0.50 per million tokens for inputs and US$1.50 per million tokens for outputs.