Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning

Authors: Seo Yeongbin, Dongha Lee, Jinyoung Yeo

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments conducted on both newly introduced and established CKL benchmarks, TAALM proves the state-of-the-art performance upon the baselines... 4 Experiment: We conduct experiments on two benchmarks. One is our newly designed LAMA-CKL, and the other is the established benchmark, TEMPORALWIKI [Jang et al., 2022].
Researcher Affiliation Academia Yeongbin Seo Dongha Lee Jinyoung Yeo Department of Artificial Intelligence Yonsei University EMAIL
Pseudocode Yes Algorithm 1: Optimization of Train-Attention
Open Source Code Yes The code and the dataset will be available online2 [https://github.com/ybseo-ac/TAALM]
Open Datasets Yes We also introduce a new CKL benchmark, LAMA-CKL... We experiment on LAMA-CKL and previous CKL benchmark (Temporal Wiki [Jang et al., 2022])... The code and the dataset will be available online2 [https://github.com/ybseo-ac/TAALM]
Dataset Splits Yes Of the 4166 train data, 100 are used for validation.
Hardware Specification Yes 8 RTX 3090 GPU (24GB) are used, with a global batch size of 64. A single A100 (82GB) GPU is used, and the effect of batch size 16 is achieved through gradient accumulation.
Software Dependencies No The paper mentions models and frameworks (e.g., Llama2-7B, Tiny Llama-1.1B, QLoRA, Adam W optimizer) and programming languages (e.g., Python implied by code base) but does not specify version numbers for general software dependencies like PyTorch, CUDA, or specific Python libraries.
Experiment Setup Yes Learning rate 1e-4, Adam W optimizer, and max length of 512 tokens are applied. A total of 30 epochs took 25 minutes of GPU time. We utilize Llama2-7B integrated with QLo RA [Dettmers et al., 2024] as a base model... We employ Lo RA r = 64, α = 16, NF4 with BF16 computation datatype.