Compressing Word Embeddings via Deep Compositional Code Learning

Authors: Raphael Shu, Hideki Nakayama

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show the compression rate achieves 98% in a sentiment analysis task and 94% 99% in machine translation tasks without performance loss. In both tasks, the proposed method can improve the model performance by slightly lowering the compression rate. In our experiments, we focus on evaluating the maximum loss-free compression rate of word embeddings on two typical NLP tasks: sentiment analysis and machine translation.
Researcher Affiliation Academia Raphael Shu The University of Tokyo shu@nlab.ci.i.u-tokyo.ac.jp Hideki Nakayama The University of Tokyo nakayama@ci.i.u-tokyo.ac.jp
Pseudocode No The paper includes a network architecture diagram (Figure 2) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes 1The code can be found in https://github.com/zomux/neuralcompressor
Open Datasets Yes Dataset: For sentiment analysis, we use a standard separation of IMDB movie review dataset (Maas et al., 2011)... We choose the 300-dimensional uncased Glo Ve word vectors... Dataset: For machine translation tasks, we experiment on IWSLT 2014 German-to-English translation task (Cettolo et al., 2014) and ASPEC English-to-Japanese translation task (Nakazawa et al., 2016).
Dataset Splits No The paper mentions evaluating loss on a 'small validation set' for sentiment analysis and a 'fixed validation set' for code learning, and on a 'validation set composed of 50 batches' for machine translation. However, it does not provide specific percentages or exact sample counts for these validation splits, which is required for explicit split information.
Hardware Specification No The paper states that training is 'distributed to 4 GPUs' for both code learning and machine translation. However, it does not specify the model or type of GPUs used, which is required for specific hardware details.
Software Dependencies No The paper mentions software tools like 'nltk package', 'moses toolkit', 'kytea', and 'nccl package', but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes In our experiments, the batch size is set to 128. We use Adam optimizer (Kingma & Ba, 2014) with a fixed learning rate of 0.0001. The training is run for 200K iterations... The models are trained with Adam optimizer for 15 epochs with a fixed learning rate of 0.0001... All models are trained by Nesterov s accelerated gradient (Nesterov, 1983) with an initial learning rate of 0.25... All LSTMs and embeddings have 256 hidden units in the IWSLT14 task and 1000 hidden units in ASPEC task... Dropout with a rate of 0.2 is applied everywhere except the recurrent computation... All translations are decoded by the beam search with a beam size of 5.