reproducibilityindex.ai

Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning

Authors: Seo Yeongbin, Dongha Lee, Jinyoung Yeo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments conducted on both newly introduced and established CKL benchmarks, TAALM proves the state-of-the-art performance upon the baselines... 4 Experiment: We conduct experiments on two benchmarks. One is our newly designed LAMA-CKL, and the other is the established benchmark, TEMPORALWIKI [Jang et al., 2022].
Researcher Affiliation	Academia	Yeongbin Seo Dongha Lee Jinyoung Yeo Department of Artificial Intelligence Yonsei University {suhcrates,donalee,jinyeo}@yonsei.ac.kr
Pseudocode	Yes	Algorithm 1: Optimization of Train-Attention
Open Source Code	Yes	The code and the dataset will be available online2 [https://github.com/ybseo-ac/TAALM]
Open Datasets	Yes	We also introduce a new CKL benchmark, LAMA-CKL... We experiment on LAMA-CKL and previous CKL benchmark (Temporal Wiki [Jang et al., 2022])... The code and the dataset will be available online2 [https://github.com/ybseo-ac/TAALM]
Dataset Splits	Yes	Of the 4166 train data, 100 are used for validation.
Hardware Specification	Yes	8 RTX 3090 GPU (24GB) are used, with a global batch size of 64. A single A100 (82GB) GPU is used, and the effect of batch size 16 is achieved through gradient accumulation.
Software Dependencies	No	The paper mentions models and frameworks (e.g., Llama2-7B, Tiny Llama-1.1B, QLoRA, Adam W optimizer) and programming languages (e.g., Python implied by code base) but does not specify version numbers for general software dependencies like PyTorch, CUDA, or specific Python libraries.
Experiment Setup	Yes	Learning rate 1e-4, Adam W optimizer, and max length of 512 tokens are applied. A total of 30 epochs took 25 minutes of GPU time. We utilize Llama2-7B integrated with QLo RA [Dettmers et al., 2024] as a base model... We employ Lo RA r = 64, α = 16, NF4 with BF16 computation datatype.