Differentiable Product Quantization for End-to-End Embedding Compression
Authors: Ting Chen, Lala Li, Yizhou Sun
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on ten different datasets across three language tasks, with additional experiments on BERT (Devlin et al., 2018) pre-training, by simply replacing the original embedding layer with DPQ. The results show that DPQ can learn compact discrete embeddings with higher compression ratios than existing methods, at the same time achieving the same performance as the original full embeddings. |
| Researcher Affiliation | Collaboration | 1Google Research 2University of California, Los Angeles. Correspondence to: Ting Chen <iamtingchen@google.com>. |
| Pseudocode | Yes | Algorithm 1 Inference of embedding for i-th token. Require: V RK D (d/D), C {1, ..., K}n D for j {1, ..., D} do h(j) i = V(j) C(j) i end for return concatenate(h(1) i , h(2) i , ..., h(D) i ) |
| Open Source Code | Yes | 1Code at: github.com/chentingpc/dpq_embedding_compression. |
| Open Datasets | Yes | We conduct experiments on ten datasets across three tasks: language modeling (LM), neural machine translation (NMT) and text classification (Text C) (Zhang et al., 2015). We adopt existing architectures for these tasks as base models and only replace the input embedding layer with DPQ. The details of datasets and base models are summarized in Table 2. LM PTB 10,000 Words LSTM-based models from (Zaremba et al., 2014), three model sizes Wikitext-2 33,278 |
| Dataset Splits | No | The paper mentions using various datasets and models but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or sample counts) within the main text. |
| Hardware Specification | Yes | Figure 4. Extra training cost incurred by DPQ, measured on a medium sized LSTM for LM trained on Tesla-V100 GPUs. |
| Software Dependencies | No | The paper mentions general software frameworks and models (e.g., BERT-base, LSTM) but does not provide specific version numbers for any software components or libraries (e.g., Python, TensorFlow, PyTorch versions) needed for reproduction. |
| Experiment Setup | Yes | To further test DPQ, we replace the embedding layer in BERT with our DPQ for both pre-training and fine-tuning. We do not perform hyper-parameter search for DPQ, but simply use the best configuration from our experiments on WMT19 En De using Transformer, i.e. we use DPQ-SX with no subspace-sharing with K = 32, D = 128. For both pre-training and fine-tuning, we use the same exact configurations and hyper-parameters as in original BERT-base in (Devlin et al., 2018). |