An efficient framework for learning sentence representations

Authors: Lajanugen Logeswaran, Honglak Lee

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our sentence representations by using them as feature representations for downstream NLP tasks. Alternative fine-grained evaluation tasks such as identifying word appearance and word order were proposed in Adi et al. (2017).
Researcher Affiliation Collaboration Lajanugen Logeswaran & Honglak Lee University of Michigan, Ann Arbor, MI, USA Google Brain, Mountain View, CA, USA {llajan,honglak}@umich.edu,honglak@google.com
Pseudocode No The paper provides formal descriptions of its objective function and model components but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The pre-trained encoders will be made publicly available. (This is a promise for future availability, not a current release). Footnote 1 provides a link to a TensorFlow implementation of 'skip_thoughts', which is a baseline model, not the authors' own code.
Open Datasets Yes Models were trained on the 7000 novels of the Book Corpus dataset (Kiros et al., 2015). The dataset consists of about 45M ordered sentences. We also consider a larger corpus for training: the UMBC corpus (Han et al., 2013), a dataset of 100M web pages crawled from the internet, preprocessed and tokenized into paragraphs. The MSCOCO dataset (Lin et al., 2014) has been traditionally used for this task.
Dataset Splits Yes Hyperparameters including batch size, learning rate, prediction context size were obtained using prediction accuracies (accuracy of predicting context sentences) on the validation set. The other tasks come with train/dev/test splits and the dev set is used for choosing the regularization parameter. We use the train/val/test split proposed in Karpathy & Fei-Fei (2015).
Hardware Specification Yes Our models are implemented in Tensorflow. Experiments were performed using cuda 8.0 and cu DNN 6.0 libraries on a GTX Titan X GPU. Our best Book Corpus model (MC-QT) trains in just under 11hrs (On both the Titan X and GTX 1080).
Software Dependencies Yes Our models are implemented in Tensorflow. Experiments were performed using cuda 8.0 and cu DNN 6.0 libraries on a GTX Titan X GPU.
Experiment Setup Yes A context size of 3 was used, i.e., predicting the previous and next sentences given the current sentence. We used a batch size of 400 and learning rate of 5e-4 with the Adam optimizer for all experiments. All our RNN-based models are single-layered and use GRU cells. Weights of the GRU are initialized using uniform Xavier initialization and gate biases are initialized to 1. Word embeddings are initialized from U[−0.1, 0.1].