Deep State Space Models for Unconditional Word Generation
Authors: Florian Schmidt, Thomas Hofmann
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Evaluation, 5 Experiments, Table 1 shows the result for the standard split. |
| Researcher Affiliation | Academia | Florian Schmidt Department of Computer Science ETH Zürich florian.schmidt@inf.ethz.ch Thomas Hofmann Department of Computer Science ETH Zürich thomas.hofmann@inf.ethz.ch |
| Pseudocode | Yes | Algorithm 1 Detailed forward pass with importance weighting |
| Open Source Code | No | No concrete statement about open-source code availability or repository links found in the paper. |
| Open Datasets | Yes | For our experiments, we use the Books Corpus [KZS+15, ZKZ+15], a freely available collection of novels comprising of almost 1B tokens out of which 1.3M are unique. |
| Dataset Splits | Yes | Besides the standard 10% test-train split at the word level, we also perform a second, alternative split at the vocabulary level. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | No specific ancillary software details, such as library or solver names with version numbers, are mentioned in the paper. |
| Experiment Setup | Yes | Hidden state size and embedding size are identical to our model s. We investigate the flow in Equation (10), denoted as TRIL, its diagonal version DIAG and a simple identity ID. For the weighted version we use K {2, 5, 10} samples. Furthermore, we investigate deviating from the factorization (3) by using a bidirectional RNN conditioning on all w1...T in every timestep. Finally, for the best performing configuration, we also investigate state-sizes d = {16, 32}. |