reproducibilityindex.ai

Conditional Language Learning with Context

Authors: Xiao Zhang, Miao Li, Ji Wu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate knowledge learning in finetuned language models with question answering tasks, a common approach in previous work (Hendrycks et al., 2021; Singhal et al., 2023). Figure 5. Performance-forgetting tradeoff curve of standard finetuning and conditional finetuning on Anatomy and SQu AD (closed-book).
Researcher Affiliation	Academia	1Department of Electronics Engineering, Tsinghua University 2College of AI, Tsinghua University. Correspondence to: Ji Wu <wuji ee@mail.tsinghua.edu.cn>.
Pseudocode	No	The paper describes the method verbally and with a diagram, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	We release our code implementation along with the original part of the data used in the paper1. 1https://github.com/xiaozeroone/ conditional_finetune
Open Datasets	Yes	We use the medical textbooks provided with the Med QA dataset (Jin et al., 2021) as a domain corpus to finetune LLa MA-2 (Touvron et al., 2023b) C4 (Raffel et al., 2020), a corpus of general web text.
Dataset Splits	No	The paper mentions evaluating on a validation split of C4 for perplexity, but does not explicitly provide the train/validation/test splits for its own training data (medical textbooks, Wikipedia excerpts) in a reproducible manner for the models being finetuned.
Hardware Specification	Yes	We use the Transformers library (Wolf et al., 2020) and an NVIDIA A100 GPU for the experiments.
Software Dependencies	No	The paper mentions using 'Transformers library (Wolf et al., 2020)', 'Adam W optimizer (Loshchilov & Hutter, 2019)', 'PEFT (Mangrulkar et al., 2022) library', and 'Eleuther AI s Language Model Evaluation Harness framework (Gao et al., 2021)' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We finetune the model with the Adam W optimizer (Loshchilov & Hutter, 2019), a learning rate of 3e-5, and a batch size of 16. The maximum sequence length is set to 2048. A linear learning rate decay is used with a warm-up of 10% of the total number of steps. We use a gradient clipping at 1.0.