An efficient framework for learning sentence representations
Authors: Lajanugen Logeswaran, Honglak Lee
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our sentence representations by using them as feature representations for downstream NLP tasks. Alternative fine-grained evaluation tasks such as identifying word appearance and word order were proposed in Adi et al. (2017). |
| Researcher Affiliation | Collaboration | Lajanugen Logeswaran & Honglak Lee University of Michigan, Ann Arbor, MI, USA Google Brain, Mountain View, CA, USA {llajan,honglak}@umich.edu,honglak@google.com |
| Pseudocode | No | The paper provides formal descriptions of its objective function and model components but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The pre-trained encoders will be made publicly available. (This is a promise for future availability, not a current release). Footnote 1 provides a link to a TensorFlow implementation of 'skip_thoughts', which is a baseline model, not the authors' own code. |
| Open Datasets | Yes | Models were trained on the 7000 novels of the Book Corpus dataset (Kiros et al., 2015). The dataset consists of about 45M ordered sentences. We also consider a larger corpus for training: the UMBC corpus (Han et al., 2013), a dataset of 100M web pages crawled from the internet, preprocessed and tokenized into paragraphs. The MSCOCO dataset (Lin et al., 2014) has been traditionally used for this task. |
| Dataset Splits | Yes | Hyperparameters including batch size, learning rate, prediction context size were obtained using prediction accuracies (accuracy of predicting context sentences) on the validation set. The other tasks come with train/dev/test splits and the dev set is used for choosing the regularization parameter. We use the train/val/test split proposed in Karpathy & Fei-Fei (2015). |
| Hardware Specification | Yes | Our models are implemented in Tensorflow. Experiments were performed using cuda 8.0 and cu DNN 6.0 libraries on a GTX Titan X GPU. Our best Book Corpus model (MC-QT) trains in just under 11hrs (On both the Titan X and GTX 1080). |
| Software Dependencies | Yes | Our models are implemented in Tensorflow. Experiments were performed using cuda 8.0 and cu DNN 6.0 libraries on a GTX Titan X GPU. |
| Experiment Setup | Yes | A context size of 3 was used, i.e., predicting the previous and next sentences given the current sentence. We used a batch size of 400 and learning rate of 5e-4 with the Adam optimizer for all experiments. All our RNN-based models are single-layered and use GRU cells. Weights of the GRU are initialized using uniform Xavier initialization and gate biases are initialized to 1. Word embeddings are initialized from U[−0.1, 0.1]. |