Conditional Language Learning with Context
Authors: Xiao Zhang, Miao Li, Ji Wu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate knowledge learning in finetuned language models with question answering tasks, a common approach in previous work (Hendrycks et al., 2021; Singhal et al., 2023). Figure 5. Performance-forgetting tradeoff curve of standard finetuning and conditional finetuning on Anatomy and SQu AD (closed-book). |
| Researcher Affiliation | Academia | 1Department of Electronics Engineering, Tsinghua University 2College of AI, Tsinghua University. Correspondence to: Ji Wu <wuji ee@mail.tsinghua.edu.cn>. |
| Pseudocode | No | The paper describes the method verbally and with a diagram, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our code implementation along with the original part of the data used in the paper1. 1https://github.com/xiaozeroone/ conditional_finetune |
| Open Datasets | Yes | We use the medical textbooks provided with the Med QA dataset (Jin et al., 2021) as a domain corpus to finetune LLa MA-2 (Touvron et al., 2023b) C4 (Raffel et al., 2020), a corpus of general web text. |
| Dataset Splits | No | The paper mentions evaluating on a validation split of C4 for perplexity, but does not explicitly provide the train/validation/test splits for its own training data (medical textbooks, Wikipedia excerpts) in a reproducible manner for the models being finetuned. |
| Hardware Specification | Yes | We use the Transformers library (Wolf et al., 2020) and an NVIDIA A100 GPU for the experiments. |
| Software Dependencies | No | The paper mentions using 'Transformers library (Wolf et al., 2020)', 'Adam W optimizer (Loshchilov & Hutter, 2019)', 'PEFT (Mangrulkar et al., 2022) library', and 'Eleuther AI s Language Model Evaluation Harness framework (Gao et al., 2021)' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We finetune the model with the Adam W optimizer (Loshchilov & Hutter, 2019), a learning rate of 3e-5, and a batch size of 16. The maximum sequence length is set to 2048. A linear learning rate decay is used with a warm-up of 10% of the total number of steps. We use a gradient clipping at 1.0. |