Why Are Learned Indexes So Effective?

Authors: Paolo Ferragina, Fabrizio Lillo, Giorgio Vinciguerra

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our general result is then specialised to five well-known distributions: Uniform, Lognormal, Pareto, Exponential, and Gamma; and it is corroborated in precision and robustness by a large set of experiments.
Researcher Affiliation Academia 1Department of Computer Science, University of Pisa, Italy 2Department of Mathematics, University of Bologna, Italy.
Pseudocode No The paper describes algorithms but does not include any structured pseudocode blocks or clearly labeled algorithm figures.
Open Source Code Yes The code to reproduce the experiments is available at https://github.com/gvinciguerra/Learnedindexes-effectiveness.
Open Datasets Yes Figure 6 shows the results of our final experiment, which measured the average segment length of OPT on real-world datasets of 200 million elements from Kipf et al. (2019). The books dataset represents book sale popularity from Amazon, while fb contains Facebook user IDs.
Dataset Splits No The paper does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification Yes The experiments were run on an Intel Xeon Gold 6132 CPU.
Software Dependencies No The paper mentions that code is available but does not specify any software dependencies (e.g., programming languages, libraries, or solvers) with version numbers.
Experiment Setup No The paper does not provide specific details about the experimental setup, such as hyperparameter values (e.g., learning rate, batch size) or system-level training settings for the algorithms.