reproducibilityindex.ai

Differentiable Product Quantization for End-to-End Embedding Compression

Authors: Ting Chen, Lala Li, Yizhou Sun

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on ten different datasets across three language tasks, with additional experiments on BERT (Devlin et al., 2018) pre-training, by simply replacing the original embedding layer with DPQ. The results show that DPQ can learn compact discrete embeddings with higher compression ratios than existing methods, at the same time achieving the same performance as the original full embeddings.
Researcher Affiliation	Collaboration	1Google Research 2University of California, Los Angeles. Correspondence to: Ting Chen <iamtingchen@google.com>.
Pseudocode	Yes	Algorithm 1 Inference of embedding for i-th token. Require: V RK D (d/D), C {1, ..., K}n D for j {1, ..., D} do h(j) i = V(j) C(j) i end for return concatenate(h(1) i , h(2) i , ..., h(D) i )
Open Source Code	Yes	1Code at: github.com/chentingpc/dpq_embedding_compression.
Open Datasets	Yes	We conduct experiments on ten datasets across three tasks: language modeling (LM), neural machine translation (NMT) and text classiﬁcation (Text C) (Zhang et al., 2015). We adopt existing architectures for these tasks as base models and only replace the input embedding layer with DPQ. The details of datasets and base models are summarized in Table 2. LM PTB 10,000 Words LSTM-based models from (Zaremba et al., 2014), three model sizes Wikitext-2 33,278
Dataset Splits	No	The paper mentions using various datasets and models but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or sample counts) within the main text.
Hardware Specification	Yes	Figure 4. Extra training cost incurred by DPQ, measured on a medium sized LSTM for LM trained on Tesla-V100 GPUs.
Software Dependencies	No	The paper mentions general software frameworks and models (e.g., BERT-base, LSTM) but does not provide specific version numbers for any software components or libraries (e.g., Python, TensorFlow, PyTorch versions) needed for reproduction.
Experiment Setup	Yes	To further test DPQ, we replace the embedding layer in BERT with our DPQ for both pre-training and ﬁne-tuning. We do not perform hyper-parameter search for DPQ, but simply use the best conﬁguration from our experiments on WMT19 En De using Transformer, i.e. we use DPQ-SX with no subspace-sharing with K = 32, D = 128. For both pre-training and ﬁne-tuning, we use the same exact conﬁgurations and hyper-parameters as in original BERT-base in (Devlin et al., 2018).