Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Temporal-Difference Variational Continual Learning
Authors: Luckeciano Carvalho Melo, Alessandro Abate, Yarin Gal
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on challenging CL benchmarks show that our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods. |
| Researcher Affiliation | Academia | Luckeciano C. Melo 1,2 Alessandro Abate 2 Yarin Gal 1 1 OATML, University of Oxford 2 OXCAV, University of Oxford |
| Pseudocode | No | The paper includes mathematical equations, definitions, and propositions but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code. |
| Open Source Code | Yes | Our code is available at https://github.com/luckeciano/TD-VCL |
| Open Datasets | Yes | We evaluate five benchmarks for Continual Learning (CL). First, we introduce three new benchmarks: Permuted MNIST-Hard, Split MNIST-Hard, and Split Not MNIST-Hard. These are more challenging versions of traditional CL benchmarks with similar names. [...] Next, we also evaluate on two other popular CL benchmarks: CIFAR100-10 and Tiny Image Net-10. |
| Dataset Splits | Yes | For CIFAR100-10: The dataset contains 50,000 images (5,000 per task) for training/validation and 10,000 images (1,000 per task) for evaluation. For Tiny Image Net-10: The dataset contains 100,000 images (10,000 per task) for training/validation and 10,000 images (1,000 per task) for evaluation. |
| Hardware Specification | Yes | We execute all experiments using a single GPU RTX 4090. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer [57]' and 'Alex Net architecture [51]' but does not provide specific version numbers for software dependencies like Python, PyTorch, or TensorFlow, or other libraries. |
| Experiment Setup | Yes | We adopt fully connected neural networks for Permuted MNIST-Hard, Split MNIST-Hard and Split Not MNIST-Hard. We choose different depths and sizes depending on the benchmark, and we provide a full list of hyperparameters in Appendix H. [...] Table 4 provides the shared hyperparameters used in each benchmark. Tables 5 and 6 provided the specific hyperparameters for the proposed methods and baselines, respectively. |