Learning-Based Frequency Estimation Algorithms
Authors: Chen-Yu Hsu, Piotr Indyk, Dina Katabi, Ali Vakilian
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also evaluate our algorithms on two real-world datasets and demonstrate empirically their performance gains. |
| Researcher Affiliation | Academia | Chen-Yu Hsu, Piotr Indyk, Dina Katabi & Ali Vakilian Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139, USA {cyhsu,indyk,dk,vakilian}@mit.edu |
| Pseudocode | Yes | Algorithm 1 provides pseudo code for our design. The design assumes an oracle HH(i) that attempts to determine whether an item i is a heavy hitter or not. |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the availability of its own source code. |
| Open Datasets | Yes | CAIDA. Caida internet traces 2016 chicago. http://www.caida.org/data/monitors/passive-equinix-chicago.xml. Dataset: The traffic data is collected at a backbone link of a Tier1 ISP between Chicago and Seattle in 2016 (CAIDA). Dataset: We use the AOL query log dataset, which consists of 21 million search queries collected from 650 thousand users over 90 days. The users are anonymized in the dataset. |
| Dataset Splits | Yes | For a recording session, we use the first 7 minutes for training, the following minute for validation, and estimate the packet counts in subsequent minutes. We use the first 5 days for training, the following day for validation, and estimate the number of times different search queries appear in subsequent days. |
| Hardware Specification | No | The inference time takes 2.8 microseconds per item on a single GPU without any optimizations4. (Footnote 4) Note that new specialized hardware such as Google TPU, hardware accelerators and network compression (Han et al., 2017; Sze et al., 2017; Chen et al., 2017; Han et al., 2016; 2015) can drastically improve the inference time. Further, Nvidia has predicted that GPU will get 1000x faster by 2025. The specific model of the GPU is not mentioned. |
| Software Dependencies | No | The paper mentions using "neural networks", "RNNs", "fully-connected networks", and "LSTM cells", but it does not specify any software names with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | Model: We trained a neural network to predict the log of the packet counts for each flow. ...We use two RNNs to encode the source and destination IP addresses separately. ...Additionally, we use two-layer fully-connected networks to encode the source and destination ports. ...3We use RNNs with 64 hidden units. The two-layer fully-connected networks for the ports have 16 and 8 hidden units. The final layer before the prediction has 32 hidden units. To construct the heavy hitter oracle, we trained a neural network to predict the number of times a search phrase appears. To process the search phrase, we train an RNN with LSTM cells that takes characters of a search phrase as input. ...6We use an embedding size of 64 dimensions, an RNN with 256 hidden units, and a fully-connected layer with 32 hidden units. |