Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency
Authors: Peng Chen, Hailiang Zhao, Jiaji Zhang, Xueyan Tang, Yixuan Wang, Shuiguang Deng
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across multiple real-world datasets and prediction models validate the effectiveness of GUARD in practice. |
| Researcher Affiliation | Academia | 1Zhejiang University 2Nanyang Technological University 3Nanjing University of Aeronautics and Astronautics |
| Pseudocode | Yes | Algorithm 1 describes GUARD&A. |
| Open Source Code | Yes | We implement a new benchmark, Cache-Coliseum, for comprehensive comparison of learning-augmented algorithms (including ours), which is publicly available at https: //github.com/Opti Sys-ZJU/cache-coliseum. |
| Open Datasets | Yes | We use Bright Kite [16] and Citi [17], with cache sizes set to 10 and 100, respectively, following Lykouris and Vassilvtiskii [5]. We further use SPEC CPU2006 memory traces [18] to evaluate real-world performance. |
| Dataset Splits | No | The paper mentions using Bright Kite, Citi, and SPEC CPU2006 datasets but does not explicitly provide details on how these datasets are split into training, test, or validation sets for the caching algorithm evaluation. While it mentions training predictor models (like Light GBM and Parrot), it refers to methodologies from prior work ([9], [10]) without detailing the splits within this paper. |
| Hardware Specification | No | Most of our experiments rely solely on a standard computer equipped with a CPU and RAM. Evaluating algorithms that use a neural network-based predictor, such as PARROT, requires a GPU. |
| Software Dependencies | No | The paper mentions using Light GBM for the LRB predictor, but does not provide a specific version number. It also refers to other models like Parrot which uses 'LSTM and attention mechanisms' but without specific library versions. |
| Experiment Setup | Yes | For switching-based algorithms, we follow ChΕ edowski et al. [15], setting a deterministic switching bound of 1 and a randomized weight Ξ² = 0.99. Following the methodology of Song et al. [9], we extract features from the SPEC CPU2006 datasets, setting |Deltai| = 10 and |EDCi| = 10. In addition to the original PC and address features, a total of 22 features were used for training. The GBM model is configured with a learning rate of 0.01, a maximum depth of 6, and 31 leaves. Both the sub-sample rate and column sample rate are set to 0.8. The model employs L2-norm loss, and early stopping is applied at 8000 rounds to determine the optimal parameters. Parrot is trained for 20,000 steps with a batch size of 32, without applying the Dagger algorithm [46]. |