Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning

Authors: Seo Yeongbin, Dongha Lee, Jinyoung Yeo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments conducted on both newly introduced and established CKL benchmarks, TAALM proves the state-of-the-art performance upon the baselines... 4 Experiment: We conduct experiments on two benchmarks. One is our newly designed LAMA-CKL, and the other is the established benchmark, TEMPORALWIKI [Jang et al., 2022].
Researcher Affiliation Academia Yeongbin Seo Dongha Lee Jinyoung Yeo Department of Artificial Intelligence Yonsei University {suhcrates,donalee,jinyeo}@yonsei.ac.kr
Pseudocode Yes Algorithm 1: Optimization of Train-Attention
Open Source Code Yes The code and the dataset will be available online2 [https://github.com/ybseo-ac/TAALM]
Open Datasets Yes We also introduce a new CKL benchmark, LAMA-CKL... We experiment on LAMA-CKL and previous CKL benchmark (Temporal Wiki [Jang et al., 2022])... The code and the dataset will be available online2 [https://github.com/ybseo-ac/TAALM]
Dataset Splits Yes Of the 4166 train data, 100 are used for validation.
Hardware Specification Yes 8 RTX 3090 GPU (24GB) are used, with a global batch size of 64. A single A100 (82GB) GPU is used, and the effect of batch size 16 is achieved through gradient accumulation.
Software Dependencies No The paper mentions models and frameworks (e.g., Llama2-7B, Tiny Llama-1.1B, QLoRA, Adam W optimizer) and programming languages (e.g., Python implied by code base) but does not specify version numbers for general software dependencies like PyTorch, CUDA, or specific Python libraries.
Experiment Setup Yes Learning rate 1e-4, Adam W optimizer, and max length of 512 tokens are applied. A total of 30 epochs took 25 minutes of GPU time. We utilize Llama2-7B integrated with QLo RA [Dettmers et al., 2024] as a base model... We employ Lo RA r = 64, α = 16, NF4 with BF16 computation datatype.