Recurrent Predictive State Policy Networks
Authors: Ahmed Hefny, Zita Marinho, Wen Sun, Siddhartha Srinivasa, Geoffrey Gordon
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from Open AI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method. |
| Researcher Affiliation | Academia | 1Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA 2Robotics Institute, Carnegie Mellon University, Pittsburgh, USA 3ISR/IT, Instituto Superior T ecnico, Lisbon, Portugal 4Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA. |
| Pseudocode | Yes | Algorithm 1 Recurrent Predictive State Policy network Optimization (RPSPO) |
| Open Source Code | Yes | https://github.com/ahefnycmu/rpsp |
| Open Datasets | Yes | We evaluate the RPSP-network s performance on a collection of reinforcement learning tasks using Open AI Gym Mujoco environments. |
| Dataset Splits | No | The paper discusses batch sizes and episode lengths for experiments, but does not provide specific percentages or counts for training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU model, CPU type, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Open AI Gym Mujoco environments' and 'RLLab' implementation of TRPO, but it does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For RPSP, we found that a step size of 10 2 performs well for both VRPG and alternating optimization in all environments. The reactive policy contains one hidden layer of 16 nodes with Re LU activation. For each environment, we set the number of samples in the batch to 10000 and the maximum length of each episode to 200, 500, 1000, 1000 for Cart-Pole, Swimmer, Hopper and Walker2d respectively. |