Recurrent Predictive State Policy Networks

Authors: Ahmed Hefny, Zita Marinho, Wen Sun, Siddhartha Srinivasa, Geoffrey Gordon

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from Open AI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method.
Researcher Affiliation Academia 1Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA 2Robotics Institute, Carnegie Mellon University, Pittsburgh, USA 3ISR/IT, Instituto Superior T ecnico, Lisbon, Portugal 4Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA.
Pseudocode Yes Algorithm 1 Recurrent Predictive State Policy network Optimization (RPSPO)
Open Source Code Yes https://github.com/ahefnycmu/rpsp
Open Datasets Yes We evaluate the RPSP-network s performance on a collection of reinforcement learning tasks using Open AI Gym Mujoco environments.
Dataset Splits No The paper discusses batch sizes and episode lengths for experiments, but does not provide specific percentages or counts for training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU model, CPU type, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'Open AI Gym Mujoco environments' and 'RLLab' implementation of TRPO, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes For RPSP, we found that a step size of 10 2 performs well for both VRPG and alternating optimization in all environments. The reactive policy contains one hidden layer of 16 nodes with Re LU activation. For each environment, we set the number of samples in the batch to 10000 and the maximum length of each episode to 200, 500, 1000, 1000 for Cart-Pole, Swimmer, Hopper and Walker2d respectively.