Learned Index with Dynamic $\epsilon$
Authors: Daoyuan Chen, Wuchao Li, Yaliang Li, Bolin Ding, Kai Zeng, Defu Lian, Jingren Zhou
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with real-world datasets and several state-of-the-art methods demonstrate the efficiency, effectiveness and usability of the proposed framework. We apply the proposed framework to several SOTA learned index methods, and conduct a series of experiments on three widely adopted real-world datasets. |
| Researcher Affiliation | Collaboration | Daoyuan Chen1 , Wuchao Li2, Yaliang Li1 , Bolin Ding,1 Kai Zeng3 , Defu Lian2, Jingren Zhou1 1 Alibaba Group, 2 University of Science and Technology of China, 3 Huawei |
| Pseudocode | Yes | We summarize the proposed algorithm below. In Section 3.4, we provide detailed descriptions of the initialization and adjustment sub-procedures. The lookahead() and optimize() are in the Look-ahead Data and Seg Err and Optimization paragraph respectively. Algorithm Dynamic ϵ Adjustment with Pluggable ϵ Learner Input: D: Data to be indexed, A: Learned index algorithm, ϵ: Expected ϵ, ρ: Length percentage for look-ahead data Output: S: Learned segments with varied ϵs 1: initial parameters w1,2,3 of the learned function: f(ϵ, µ, σ) = w1( µ 2: initial mean length of learned segments so far: Len(DS) 404 3: S , (ˆµ/ˆσ) 0 4: repeat 5: Get data statistic: 6: (µ/σ) lookahead(D, Len(DS) ρ) 7: Adjust ϵ based on the learner: 8: ϵ Seg Err/w1( µ 9: Learn new segment Si using adjusted ϵ : 10: [Si, Di] A(D, ϵ ) 11: S S Si 12: D D \ Di, DS DS Di 13: Online update Len(DS): 14: Len(DS) running-mean Len(DS), Len(Di) 15: (ˆµ/ˆσ) running-mean (ˆµ/ˆσ), (µ/σ) 16: Train the learner with ground truth: 17: w1,2,3 optimize(f, Si, Seg Erri) 18: Seg Err w1(ˆµ/ˆσ)w2 ϵw3 19: until D = |
| Open Source Code | Yes | To facilitate further studies, we make our codes and datasets public at https://github.com/yxdyc/Learned-Index-Dynamic-Epsilon. |
| Open Datasets | Yes | To facilitate further studies, we make our codes and datasets public at https://github.com/yxdyc/Learned-Index-Dynamic-Epsilon. |
| Dataset Splits | No | The paper mentions an ablation study but does not specify the train/validation/test splits. |
| Hardware Specification | Yes | All the experiments are conducted on a Linux server with an Intel Xeon Platinum 8163 2.50GHz CPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies like Python, PyTorch, etc., only mentions general implementations like stx::btree. |
| Experiment Setup | Yes | As for the hyper-parameter ρ (described in the Section 3.4), we conduct a grid search over ρ [0.1, 0.4, 0.7, 1.0] on Map and Io T datasets. We found that all the ρs achieve better N-MAE trade-off (i.e., smaller AUNEC results) than the fixed ϵ versions. Since the setting ρ = 0.4 achieves averagely best results on the two datasets, we set ρ to be 0.4 for the other datasets. |