A Non-monotonic Self-terminating Language Model

Authors: Eugene Choi, Kyunghyun Cho, Cheolhyoung Lee

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our model on sequence completion tasks with various architectures. We conduct experiments validating the effectiveness of our NMST language models on sequence completion tasks, as was done in earlier studies. We test NMST parametrization with various architectures.
Researcher Affiliation Collaboration Eugene Choi eugene.choi@nyu.edu Kyunghyun Cho kyunghyun.cho@nyu.edu Cheolhyoung Lee cheolhyoung.lee@nyu.edu New York University Prescient Design, Genentech CIFAR Fellow Corresponding author.
Pseudocode No The paper includes mathematical definitions and equations but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes To ensure the reproducibility of our paper, we provide our code available at https://github. com/nyu-dl/non-monotonic-self-terminating-lm.
Open Datasets Yes We train RNN (Elman, 1990) and LSTM (Hochreiter & Schmidhuber, 1997) on Wiki Text-2 (Merity et al., 2016). We additionally finetune GPT-2 (Radford et al., 2019) on Wiki Text-103 (Merity et al., 2016).
Dataset Splits No The paper mentions using a validation set for perplexity evaluation (e.g., 'validation perplexity'), but it does not provide specific details on the train/validation/test splits, such as percentages, sample counts, or a clear methodology for partitioning the data.
Hardware Specification No The paper states, 'This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise,' but it does not specify any particular hardware components like GPU or CPU models, or memory amounts.
Software Dependencies No The paper mentions using 'Adam W (Loshchilov & Hutter, 2017)', 'BPE tokenization (Sennrich et al., 2015)', and 'pretrained GPT-2... provided by Hugging Face', but it does not specify the version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, HuggingFace Transformers).
Experiment Setup Yes We use Adam W (Loshchilov & Hutter, 2017) with an initial learning rate of 0.001, β1 = 0.9, β2 = 0.99, weight decay of 0.01, learning rate decay, and early stopping. We perform 10 random runs with a batch size of 32 for 70 epochs. We apply dropout (Srivastava et al., 2014) with drop probabilities of 0.3 and 0.5.