Predictive-State Decoders: Encoding the Future into Recurrent Networks
Authors: Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris Kitani, J. Bagnell
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. In each, our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data. |
| Researcher Affiliation | Academia | 1The Robotics Institute, Carnegie-Mellon University, Pittsburgh, PA 2School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA |
| Pseudocode | No | The paper describes models and methods using text, diagrams, and mathematical equations, but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | The Hopper dataset was generated using the Open AI simulation [12] |
| Dataset Splits | No | The paper mentions collecting datasets and using them for training and evaluation but does not specify explicit training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments, only general mentions of environments. |
| Software Dependencies | No | The paper mentions 'Tensorflow s built-in GRU and LSTM cells [1]' and 'rllab [18]' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | PREDICTIVE-STATE DECODERS require two hyperparameters: k, the number of observations to characterize the predictive state and λ, the regularization trade-off factor. In most cases, we primarily tune λ, and set k to one of {2, . . . , 10}. |