Improving Predictive State Representations via Gradient Descent

Authors: Nan Jiang, Alex Kulesza, Satinder Singh

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first show on synthetic domains that our proposed gradient procedure can improve the model, and that spectral learning provides a useful initialization. We investigate the effectiveness of our gradient procedure on a character-level language modeling problem using Wikipedia data
Researcher Affiliation Academia Nan Jiang and Alex Kulesza and Satinder Singh nanjiang@umich.edu, kulesza@gmail.com, baveja@umich.edu Computer Science & Engineering University of Michigan
Pseudocode Yes Algorithm 1 Stochastic Gradient Descent with Contrastive Divergence for Predictive State Representations.
Open Source Code No The paper does not contain an explicit statement about releasing their own source code or provide a link to a repository for it.
Open Datasets Yes We investigate the effectiveness of our gradient procedure on a character-level language modeling problem using Wikipedia data (Sutskever, Martens, and Hinton 2011)
Dataset Splits No The paper mentions training and testing datasets, but does not explicitly specify a validation dataset split.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We use a constant learning rate of η = 10 6. To prevent the model parameters from experiencing sudden changes due to occasional stochastic gradients with a large magnitude, we rescale the stochastic gradient term Δ to guarantee that Δ 10. The learning rate and momentum parameters are set to 10 7 and 0.9, respectively.