A Non-monotonic Self-terminating Language Model
Authors: Eugene Choi, Kyunghyun Cho, Cheolhyoung Lee
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our model on sequence completion tasks with various architectures. We conduct experiments validating the effectiveness of our NMST language models on sequence completion tasks, as was done in earlier studies. We test NMST parametrization with various architectures. |
| Researcher Affiliation | Collaboration | Eugene Choi eugene.choi@nyu.edu Kyunghyun Cho kyunghyun.cho@nyu.edu Cheolhyoung Lee cheolhyoung.lee@nyu.edu New York University Prescient Design, Genentech CIFAR Fellow Corresponding author. |
| Pseudocode | No | The paper includes mathematical definitions and equations but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | To ensure the reproducibility of our paper, we provide our code available at https://github. com/nyu-dl/non-monotonic-self-terminating-lm. |
| Open Datasets | Yes | We train RNN (Elman, 1990) and LSTM (Hochreiter & Schmidhuber, 1997) on Wiki Text-2 (Merity et al., 2016). We additionally finetune GPT-2 (Radford et al., 2019) on Wiki Text-103 (Merity et al., 2016). |
| Dataset Splits | No | The paper mentions using a validation set for perplexity evaluation (e.g., 'validation perplexity'), but it does not provide specific details on the train/validation/test splits, such as percentages, sample counts, or a clear methodology for partitioning the data. |
| Hardware Specification | No | The paper states, 'This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise,' but it does not specify any particular hardware components like GPU or CPU models, or memory amounts. |
| Software Dependencies | No | The paper mentions using 'Adam W (Loshchilov & Hutter, 2017)', 'BPE tokenization (Sennrich et al., 2015)', and 'pretrained GPT-2... provided by Hugging Face', but it does not specify the version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, HuggingFace Transformers). |
| Experiment Setup | Yes | We use Adam W (Loshchilov & Hutter, 2017) with an initial learning rate of 0.001, β1 = 0.9, β2 = 0.99, weight decay of 0.01, learning rate decay, and early stopping. We perform 10 random runs with a batch size of 32 for 70 epochs. We apply dropout (Srivastava et al., 2014) with drop probabilities of 0.3 and 0.5. |