Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Why Are Learned Indexes So Effective?
Authors: Paolo Ferragina, Fabrizio Lillo, Giorgio Vinciguerra
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our general result is then specialised to ο¬ve well-known distributions: Uniform, Lognormal, Pareto, Exponential, and Gamma; and it is corroborated in precision and robustness by a large set of experiments. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Pisa, Italy 2Department of Mathematics, University of Bologna, Italy. |
| Pseudocode | No | The paper describes algorithms but does not include any structured pseudocode blocks or clearly labeled algorithm figures. |
| Open Source Code | Yes | The code to reproduce the experiments is available at https://github.com/gvinciguerra/Learnedindexes-effectiveness. |
| Open Datasets | Yes | Figure 6 shows the results of our ο¬nal experiment, which measured the average segment length of OPT on real-world datasets of 200 million elements from Kipf et al. (2019). The books dataset represents book sale popularity from Amazon, while fb contains Facebook user IDs. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | Yes | The experiments were run on an Intel Xeon Gold 6132 CPU. |
| Software Dependencies | No | The paper mentions that code is available but does not specify any software dependencies (e.g., programming languages, libraries, or solvers) with version numbers. |
| Experiment Setup | No | The paper does not provide specific details about the experimental setup, such as hyperparameter values (e.g., learning rate, batch size) or system-level training settings for the algorithms. |