Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency

Authors: Peng Chen, Hailiang Zhao, Jiaji Zhang, Xueyan Tang, Yixuan Wang, Shuiguang Deng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across multiple real-world datasets and prediction models validate the effectiveness of GUARD in practice.
Researcher Affiliation	Academia	1Zhejiang University 2Nanyang Technological University 3Nanjing University of Aeronautics and Astronautics
Pseudocode	Yes	Algorithm 1 describes GUARD&A.
Open Source Code	Yes	We implement a new benchmark, Cache-Coliseum, for comprehensive comparison of learning-augmented algorithms (including ours), which is publicly available at https: //github.com/Opti Sys-ZJU/cache-coliseum.
Open Datasets	Yes	We use Bright Kite [16] and Citi [17], with cache sizes set to 10 and 100, respectively, following Lykouris and Vassilvtiskii [5]. We further use SPEC CPU2006 memory traces [18] to evaluate real-world performance.
Dataset Splits	No	The paper mentions using Bright Kite, Citi, and SPEC CPU2006 datasets but does not explicitly provide details on how these datasets are split into training, test, or validation sets for the caching algorithm evaluation. While it mentions training predictor models (like Light GBM and Parrot), it refers to methodologies from prior work ([9], [10]) without detailing the splits within this paper.
Hardware Specification	No	Most of our experiments rely solely on a standard computer equipped with a CPU and RAM. Evaluating algorithms that use a neural network-based predictor, such as PARROT, requires a GPU.
Software Dependencies	No	The paper mentions using Light GBM for the LRB predictor, but does not provide a specific version number. It also refers to other models like Parrot which uses 'LSTM and attention mechanisms' but without specific library versions.
Experiment Setup	Yes	For switching-based algorithms, we follow Chł edowski et al. [15], setting a deterministic switching bound of 1 and a randomized weight β = 0.99. Following the methodology of Song et al. [9], we extract features from the SPEC CPU2006 datasets, setting \|Deltai\| = 10 and \|EDCi\| = 10. In addition to the original PC and address features, a total of 22 features were used for training. The GBM model is configured with a learning rate of 0.01, a maximum depth of 6, and 31 leaves. Both the sub-sample rate and column sample rate are set to 0.8. The model employs L2-norm loss, and early stopping is applied at 8000 rounds to determine the optimal parameters. Parrot is trained for 20,000 steps with a batch size of 32, without applying the Dagger algorithm [46].