Language modeling via stochastic processes

Authors: Rose E Wang, Esin Durmus, Noah Goodman, Tatsunori Hashimoto

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now evaluate the ability of Time Control to capture text dynamics. Specifically, we aim to answer the following research questions (RQ):... We run Time Control with different latent dimensions (d = 8, 16, 32).
Researcher Affiliation Academia Rose E. Wang, Esin Durmus, Noah Goodman, Tatsunori B. Hashimoto Stanford University {rewang, edurmus, ngoodman,thashim}@stanford.edu
Pseudocode No The paper describes its methods in prose but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes 1The accompanying code can be found here: https://github.com/rosewang2008/language_ modeling_via_stochastic_processes.
Open Datasets Yes Datasets We use language datasets that elicit different kinds of structure, from section structure to discourse structure to narrative structure. Time Control does not take in any information about the structure, treating each domain the same under its encoding objective. More information and dataset examples are provided in Appendix E. Wikisection (Arnold et al., 2019)... Wikihow (WH) (Koupaee & Wang, 2018)... Recipe NLG (Bie n et al., 2020)... Taskmaster-2 (TM-2) (Byrne et al., 2019)... Ticket Talk (Byrne et al., 2021)... ROC Stories (Mostafazadeh et al., 2016)...
Dataset Splits Yes We fine-tune for 10 epochs and checkpoint the models every 1000 steps; we keep the model checkpoint that scores the lowest PPL on a held-out validation set.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU model, CPU type) used for running experiments.
Software Dependencies No The paper mentions using a 'frozen, pretrained GPT2 model from Huggingface' but does not specify version numbers for GPT2, Huggingface libraries, or any other software dependencies.
Experiment Setup Yes The MLP network has intermediate Re LU activations and is trained with stochastic gradient descent with a learning rate of 1e-4 and with momentum 0.9.