reproducibilityindex.ai

A Non-monotonic Self-terminating Language Model

Authors: Eugene Choi, Kyunghyun Cho, Cheolhyoung Lee

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our model on sequence completion tasks with various architectures. We conduct experiments validating the effectiveness of our NMST language models on sequence completion tasks, as was done in earlier studies. We test NMST parametrization with various architectures.
Researcher Affiliation	Collaboration	Eugene Choi eugene.choi@nyu.edu Kyunghyun Cho kyunghyun.cho@nyu.edu Cheolhyoung Lee cheolhyoung.lee@nyu.edu New York University Prescient Design, Genentech CIFAR Fellow Corresponding author.
Pseudocode	No	The paper includes mathematical definitions and equations but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	To ensure the reproducibility of our paper, we provide our code available at https://github. com/nyu-dl/non-monotonic-self-terminating-lm.
Open Datasets	Yes	We train RNN (Elman, 1990) and LSTM (Hochreiter & Schmidhuber, 1997) on Wiki Text-2 (Merity et al., 2016). We additionally finetune GPT-2 (Radford et al., 2019) on Wiki Text-103 (Merity et al., 2016).
Dataset Splits	No	The paper mentions using a validation set for perplexity evaluation (e.g., 'validation perplexity'), but it does not provide specific details on the train/validation/test splits, such as percentages, sample counts, or a clear methodology for partitioning the data.
Hardware Specification	No	The paper states, 'This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise,' but it does not specify any particular hardware components like GPU or CPU models, or memory amounts.
Software Dependencies	No	The paper mentions using 'Adam W (Loshchilov & Hutter, 2017)', 'BPE tokenization (Sennrich et al., 2015)', and 'pretrained GPT-2... provided by Hugging Face', but it does not specify the version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, HuggingFace Transformers).
Experiment Setup	Yes	We use Adam W (Loshchilov & Hutter, 2017) with an initial learning rate of 0.001, β1 = 0.9, β2 = 0.99, weight decay of 0.01, learning rate decay, and early stopping. We perform 10 random runs with a batch size of 32 for 70 epochs. We apply dropout (Srivastava et al., 2014) with drop probabilities of 0.3 and 0.5.