Compressing Word Embeddings via Deep Compositional Code Learning
Authors: Raphael Shu, Hideki Nakayama
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show the compression rate achieves 98% in a sentiment analysis task and 94% 99% in machine translation tasks without performance loss. In both tasks, the proposed method can improve the model performance by slightly lowering the compression rate. In our experiments, we focus on evaluating the maximum loss-free compression rate of word embeddings on two typical NLP tasks: sentiment analysis and machine translation. |
| Researcher Affiliation | Academia | Raphael Shu The University of Tokyo shu@nlab.ci.i.u-tokyo.ac.jp Hideki Nakayama The University of Tokyo nakayama@ci.i.u-tokyo.ac.jp |
| Pseudocode | No | The paper includes a network architecture diagram (Figure 2) but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code can be found in https://github.com/zomux/neuralcompressor |
| Open Datasets | Yes | Dataset: For sentiment analysis, we use a standard separation of IMDB movie review dataset (Maas et al., 2011)... We choose the 300-dimensional uncased Glo Ve word vectors... Dataset: For machine translation tasks, we experiment on IWSLT 2014 German-to-English translation task (Cettolo et al., 2014) and ASPEC English-to-Japanese translation task (Nakazawa et al., 2016). |
| Dataset Splits | No | The paper mentions evaluating loss on a 'small validation set' for sentiment analysis and a 'fixed validation set' for code learning, and on a 'validation set composed of 50 batches' for machine translation. However, it does not provide specific percentages or exact sample counts for these validation splits, which is required for explicit split information. |
| Hardware Specification | No | The paper states that training is 'distributed to 4 GPUs' for both code learning and machine translation. However, it does not specify the model or type of GPUs used, which is required for specific hardware details. |
| Software Dependencies | No | The paper mentions software tools like 'nltk package', 'moses toolkit', 'kytea', and 'nccl package', but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | In our experiments, the batch size is set to 128. We use Adam optimizer (Kingma & Ba, 2014) with a fixed learning rate of 0.0001. The training is run for 200K iterations... The models are trained with Adam optimizer for 15 epochs with a fixed learning rate of 0.0001... All models are trained by Nesterov s accelerated gradient (Nesterov, 1983) with an initial learning rate of 0.25... All LSTMs and embeddings have 256 hidden units in the IWSLT14 task and 1000 hidden units in ASPEC task... Dropout with a rate of 0.2 is applied everywhere except the recurrent computation... All translations are decoded by the beam search with a beam size of 5. |