reproducibilityindex.ai

Learning-Based Frequency Estimation Algorithms

Authors: Chen-Yu Hsu, Piotr Indyk, Dina Katabi, Ali Vakilian

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also evaluate our algorithms on two real-world datasets and demonstrate empirically their performance gains.
Researcher Affiliation	Academia	Chen-Yu Hsu, Piotr Indyk, Dina Katabi & Ali Vakilian Computer Science and Artiﬁcial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139, USA {cyhsu,indyk,dk,vakilian}@mit.edu
Pseudocode	Yes	Algorithm 1 provides pseudo code for our design. The design assumes an oracle HH(i) that attempts to determine whether an item i is a heavy hitter or not.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of its own source code.
Open Datasets	Yes	CAIDA. Caida internet traces 2016 chicago. http://www.caida.org/data/monitors/passive-equinix-chicago.xml. Dataset: The trafﬁc data is collected at a backbone link of a Tier1 ISP between Chicago and Seattle in 2016 (CAIDA). Dataset: We use the AOL query log dataset, which consists of 21 million search queries collected from 650 thousand users over 90 days. The users are anonymized in the dataset.
Dataset Splits	Yes	For a recording session, we use the ﬁrst 7 minutes for training, the following minute for validation, and estimate the packet counts in subsequent minutes. We use the ﬁrst 5 days for training, the following day for validation, and estimate the number of times different search queries appear in subsequent days.
Hardware Specification	No	The inference time takes 2.8 microseconds per item on a single GPU without any optimizations4. (Footnote 4) Note that new specialized hardware such as Google TPU, hardware accelerators and network compression (Han et al., 2017; Sze et al., 2017; Chen et al., 2017; Han et al., 2016; 2015) can drastically improve the inference time. Further, Nvidia has predicted that GPU will get 1000x faster by 2025. The specific model of the GPU is not mentioned.
Software Dependencies	No	The paper mentions using "neural networks", "RNNs", "fully-connected networks", and "LSTM cells", but it does not specify any software names with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x).
Experiment Setup	Yes	Model: We trained a neural network to predict the log of the packet counts for each ﬂow. ...We use two RNNs to encode the source and destination IP addresses separately. ...Additionally, we use two-layer fully-connected networks to encode the source and destination ports. ...3We use RNNs with 64 hidden units. The two-layer fully-connected networks for the ports have 16 and 8 hidden units. The ﬁnal layer before the prediction has 32 hidden units. To construct the heavy hitter oracle, we trained a neural network to predict the number of times a search phrase appears. To process the search phrase, we train an RNN with LSTM cells that takes characters of a search phrase as input. ...6We use an embedding size of 64 dimensions, an RNN with 256 hidden units, and a fully-connected layer with 32 hidden units.