Learning Augmented Binary Search Trees

Authors: Honghao Lin, Tian Luo, David Woodruff

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Further, we experimentally evaluate our learned treap on synthetic datasets and demonstrate a performance advantage over other search tree data structures. We also present experiments on real world datasets with known frequency estimation oracles and show improvements as well.
Researcher Affiliation Academia 1Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA. Correspondence to: Honghao Lin <honghaol@andrew.cmu.edu>, Tian Luo <tianl1@andrew.cmu.edu>, David P. Woodruff <dwoodruf@andrew.cmu.edu>.
Pseudocode No The paper describes algorithms and operations in paragraph form, but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper discusses the use of machine learning models and oracles from other published works (e.g., Hsu et al. (2019) and Du et al. (2021)) but does not provide a statement or link for the open-source code of their own described methodology.
Open Datasets Yes Data: The internet traffic data was collected by CAIDA using a commercial backbone link (Tier 1 ISP) (CAIDA, 2016). URL https://www.caida.org/catalog/datasets/passive_dataset. Data: This dataset contains approximately 21 million queries on AOL collected over 90 days in 2006. The distribution follows Zipf s Law (see Hsu et al. (2019)).
Dataset Splits Yes The first 7 minutes of the dataset was used as training sets with the next 2 minutes used as the validation sets. The rest of the dataset was used for testing.
Hardware Specification No The paper describes the use of deep learning models (RNN, LSTM) for the frequency estimation oracle but does not provide any specific details about the hardware used for training or running experiments (e.g., CPU, GPU models, or cloud computing specifications).
Software Dependencies No The paper mentions that an RNN was used for modeling by Hsu et al. (2019) but does not list specific software dependencies with version numbers for its own experimental implementation (e.g., programming languages, libraries, or frameworks with versions).
Experiment Setup No The paper describes the data splits for training, validation, and testing of the machine learning oracle and the type of neural network used (RNN, LSTM cells) from Hsu et al. (2019), but it does not specify hyperparameters (e.g., learning rate, batch size, epochs) or other system-level configuration details for its own experimental setup.