Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning
Authors: Seo Yeongbin, Dongha Lee, Jinyoung Yeo
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments conducted on both newly introduced and established CKL benchmarks, TAALM proves the state-of-the-art performance upon the baselines... 4 Experiment: We conduct experiments on two benchmarks. One is our newly designed LAMA-CKL, and the other is the established benchmark, TEMPORALWIKI [Jang et al., 2022]. |
| Researcher Affiliation | Academia | Yeongbin Seo Dongha Lee Jinyoung Yeo Department of Artificial Intelligence Yonsei University EMAIL |
| Pseudocode | Yes | Algorithm 1: Optimization of Train-Attention |
| Open Source Code | Yes | The code and the dataset will be available online2 [https://github.com/ybseo-ac/TAALM] |
| Open Datasets | Yes | We also introduce a new CKL benchmark, LAMA-CKL... We experiment on LAMA-CKL and previous CKL benchmark (Temporal Wiki [Jang et al., 2022])... The code and the dataset will be available online2 [https://github.com/ybseo-ac/TAALM] |
| Dataset Splits | Yes | Of the 4166 train data, 100 are used for validation. |
| Hardware Specification | Yes | 8 RTX 3090 GPU (24GB) are used, with a global batch size of 64. A single A100 (82GB) GPU is used, and the effect of batch size 16 is achieved through gradient accumulation. |
| Software Dependencies | No | The paper mentions models and frameworks (e.g., Llama2-7B, Tiny Llama-1.1B, QLoRA, Adam W optimizer) and programming languages (e.g., Python implied by code base) but does not specify version numbers for general software dependencies like PyTorch, CUDA, or specific Python libraries. |
| Experiment Setup | Yes | Learning rate 1e-4, Adam W optimizer, and max length of 512 tokens are applied. A total of 30 epochs took 25 minutes of GPU time. We utilize Llama2-7B integrated with QLo RA [Dettmers et al., 2024] as a base model... We employ Lo RA r = 64, α = 16, NF4 with BF16 computation datatype. |