Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Retrospective Adversarial Replay for Continual Learning
Authors: Lilly Kumari, Shengjie Wang, Tianyi Zhou, Jeff A Bilmes
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this excels on broadly-used benchmarks and outperforms other continual learning baselines especially when only a small buffer is available. We conduct a thorough ablation study over each key component as well as a hyperparameter sensitivity analysis to demonstrate the effectiveness and robustness of RAR. |
| Researcher Affiliation | Collaboration | Lilly Kumari University of Washington EMAIL Shengjie Wang Byte Dance EMAIL Tianyi Zhou University of Maryland EMAIL Jeff Bilmes University of Washington EMAIL |
| Pseudocode | Yes | Algorithm 1 RAR Retrospective Adversarial Replay for Continual Learning |
| Open Source Code | Yes | Our implementation code is available at https://github.com/lillykumari8/RAR-CL. |
| Open Datasets | Yes | Datasets: We evaluate RAR on four supervised image classification benchmarks for task-free CL. (1) Split-MNIST[31]... (2) Split-CIFAR10 [28]... (3) Split-CIFAR100... (4) Split-mini Image Net [14]... |
| Dataset Splits | Yes | For hyperparameters tuning on each dataset, we hold-out 5% of the training samples for each task and use it as a validation set. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA A6000 GPUs. |
| Software Dependencies | No | Our code is implemented in Python using PyTorch and CUDA for GPU acceleration. The paper mentions software names (Python, PyTorch, CUDA) but does not provide specific version numbers for them. |
| Experiment Setup | Yes | Settings and Hyperparameters: We follow the same setup as [2] for deciding model architectures for all four datasets. For Split-MNIST, we use an MLP classifier with two hidden layers... For Split CIFAR10, Split CIFAR-100, and Split mini-Image Net, we use a reduced Res Net-18 classifier [22]. The replay budget k is the same as the mini-batch size (fixed to 10) irrespective of the buffer size m. For hyperparameters tuning on each dataset, we hold-out 5% of the training samples for each task and use it as a validation set. We provide additional details about the implementation settings in Appendix F. |