Deep State Space Models for Unconditional Word Generation

Authors: Florian Schmidt, Thomas Hofmann

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Evaluation, 5 Experiments, Table 1 shows the result for the standard split.
Researcher Affiliation Academia Florian Schmidt Department of Computer Science ETH Zürich florian.schmidt@inf.ethz.ch Thomas Hofmann Department of Computer Science ETH Zürich thomas.hofmann@inf.ethz.ch
Pseudocode Yes Algorithm 1 Detailed forward pass with importance weighting
Open Source Code No No concrete statement about open-source code availability or repository links found in the paper.
Open Datasets Yes For our experiments, we use the Books Corpus [KZS+15, ZKZ+15], a freely available collection of novels comprising of almost 1B tokens out of which 1.3M are unique.
Dataset Splits Yes Besides the standard 10% test-train split at the word level, we also perform a second, alternative split at the vocabulary level.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running experiments are mentioned in the paper.
Software Dependencies No No specific ancillary software details, such as library or solver names with version numbers, are mentioned in the paper.
Experiment Setup Yes Hidden state size and embedding size are identical to our model s. We investigate the flow in Equation (10), denoted as TRIL, its diagonal version DIAG and a simple identity ID. For the weighted version we use K {2, 5, 10} samples. Furthermore, we investigate deviating from the factorization (3) by using a bidirectional RNN conditioning on all w1...T in every timestep. Finally, for the best performing configuration, we also investigate state-sizes d = {16, 32}.