Learning-based Support Estimation in Sublinear Time

Authors: Talya Eden, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed algorithms on a collection of data sets, using the neuralnetwork based estimators from Hsu et al, ICLR 19 as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy compared to the state of the art algorithm.
Researcher Affiliation Collaboration Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139, USA
Pseudocode Yes Our algorithm is presented in Algorithm 1.
Open Source Code No The paper does not explicitly state that the source code for the described methodology is being released or provide a link to a repository.
Open Datasets Yes From CAIDA internet traces 2016, https://www.caida.org/data/monitors/ passive-equinix-chicago.xml.
Dataset Splits No The paper describes how the *predictor* was trained and validated (e.g., 'trained on the first 5 days and the 6th day is used for validation' for AOL data), but it does not specify train/validation/test splits for the main support estimation algorithm’s data, which uses entire daily or minute distributions.
Hardware Specification No The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'neural-network based estimators' and 'recurrent neural network (RNN)' but does not specify any software names with version numbers (e.g., 'TensorFlow 2.x' or 'PyTorch 1.x').
Experiment Setup Yes Algorithm 1 uses two parameters that need to be set: The polynomial degree L, and the base parameter b. The performance of the algorithm is not very sensitive to L, and for simplicity we use the same setting as Wu & Yang (2019), L = 0.45 log n. The setting of b requires more care. ... In our implementation, we start by running Algorithm 1 with b = 2. If any interval fails the sanity check, we increment b and repeat the algorithm (with the same set of samples). The final base we use is twice the minimal one such that all intervals pass the sanity check, where the final doubling is to ensure we are indeed past the point where all checks succeed.