reproducibilityindex.ai

Clustering the Sketch: Dynamic Compression for Embedding Tables

Authors: Henry Tsang, Thomas Ahle

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally CCE achieves the best of both worlds: The high compression rate of codebook-based quantization, but dynamically like hashing-based methods, so it can be used during training.Our primary experimental ﬁnding, illustrated in Table 1 and Figure 4a, indicates that CCE enables training a model with Binary Cross Entropy matching a full table baseline, using only a half the parameters required by the next best compression method.
Researcher Affiliation	Industry	Henry Ling-Hei Tsang Meta henrylhtsang@meta.com Thomas Dybdahl Ahle Meta Normal Computing thomas@ahle.dk
Pseudocode	Yes	Algorithm 1 Dense CCE for Least Squares, Algorithm 2 Sparse CCE for Least Squares, Algorithm 3 Clustered Compositional Embeddings with c columns and 2k rows
Open Source Code	Yes	An implementation of our methods and related work is available at github.com/thomasahle/cce.
Open Datasets	Yes	We used two public click log datasets from Criteo: the Kaggle and Terabyte datasets.
Dataset Splits	Yes	For both Kaggle and Terabyte dataset, we partitioned the data from the ﬁnal day into validation and test sets. We measure the performance of the model in BCE every 50,000 batches (around one-sixth of one epoch) using the validation set.
Hardware Specification	Yes	We ran the Kaggle dataset experiments on a single A100 GPU. For the Terabyte dataset experiments, we ran them on two A100 GPUs using model parallelism.
Software Dependencies	No	The paper mentions software like PyTorch, FAISS K-means, and Scikit-learn but does not provide specific version numbers for these software components.
Experiment Setup	Yes	In our experiments, we adhered to the setup from the open-source Deep Learning Recommendation Model (DLRM) by Naumov et al. [2019], including the choice of optimizer (SGD) and learning rate. For the K-means from FAISS, we use max_points_per_centroid=256 and niter=50.