reproducibilityindex.ai

Learned Index with Dynamic $\epsilon$

Authors: Daoyuan Chen, Wuchao Li, Yaliang Li, Bolin Ding, Kai Zeng, Defu Lian, Jingren Zhou

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments with real-world datasets and several state-of-the-art methods demonstrate the efficiency, effectiveness and usability of the proposed framework. We apply the proposed framework to several SOTA learned index methods, and conduct a series of experiments on three widely adopted real-world datasets.
Researcher Affiliation	Collaboration	Daoyuan Chen1 , Wuchao Li2, Yaliang Li1 , Bolin Ding,1 Kai Zeng3 , Defu Lian2, Jingren Zhou1 1 Alibaba Group, 2 University of Science and Technology of China, 3 Huawei
Pseudocode	Yes	We summarize the proposed algorithm below. In Section 3.4, we provide detailed descriptions of the initialization and adjustment sub-procedures. The lookahead() and optimize() are in the Look-ahead Data and Seg Err and Optimization paragraph respectively. Algorithm Dynamic ϵ Adjustment with Pluggable ϵ Learner Input: D: Data to be indexed, A: Learned index algorithm, ϵ: Expected ϵ, ρ: Length percentage for look-ahead data Output: S: Learned segments with varied ϵs 1: initial parameters w1,2,3 of the learned function: f(ϵ, µ, σ) = w1( µ 2: initial mean length of learned segments so far: Len(DS) 404 3: S , (ˆµ/ˆσ) 0 4: repeat 5: Get data statistic: 6: (µ/σ) lookahead(D, Len(DS) ρ) 7: Adjust ϵ based on the learner: 8: ϵ Seg Err/w1( µ 9: Learn new segment Si using adjusted ϵ : 10: [Si, Di] A(D, ϵ ) 11: S S Si 12: D D \ Di, DS DS Di 13: Online update Len(DS): 14: Len(DS) running-mean Len(DS), Len(Di) 15: (ˆµ/ˆσ) running-mean (ˆµ/ˆσ), (µ/σ) 16: Train the learner with ground truth: 17: w1,2,3 optimize(f, Si, Seg Erri) 18: Seg Err w1(ˆµ/ˆσ)w2 ϵw3 19: until D =
Open Source Code	Yes	To facilitate further studies, we make our codes and datasets public at https://github.com/yxdyc/Learned-Index-Dynamic-Epsilon.
Open Datasets	Yes	To facilitate further studies, we make our codes and datasets public at https://github.com/yxdyc/Learned-Index-Dynamic-Epsilon.
Dataset Splits	No	The paper mentions an ablation study but does not specify the train/validation/test splits.
Hardware Specification	Yes	All the experiments are conducted on a Linux server with an Intel Xeon Platinum 8163 2.50GHz CPU.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies like Python, PyTorch, etc., only mentions general implementations like stx::btree.
Experiment Setup	Yes	As for the hyper-parameter ρ (described in the Section 3.4), we conduct a grid search over ρ [0.1, 0.4, 0.7, 1.0] on Map and Io T datasets. We found that all the ρs achieve better N-MAE trade-off (i.e., smaller AUNEC results) than the fixed ϵ versions. Since the setting ρ = 0.4 achieves averagely best results on the two datasets, we set ρ to be 0.4 for the other datasets.