reproducibilityindex.ai

Predictive Information Accelerates Learning in RL

Authors: Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John Canny, Sergio Guadarrama

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that PI-SAC agents can substantially improve sample efﬁciency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels.
Researcher Affiliation	Collaboration	Kuang-Huei Lee Google Research leekh@google.com Ian Fischer Google Research iansf@google.com Anthony Z. Liu University of Michigan anthliu@umich.edu Yijie Guo University of Michigan guoyijie@umich.edu Honglak Lee Google Research honglak@google.com John Canny Google Research canny@google.com Sergio Guadarrama Google Research sguada@google.com
Pseudocode	Yes	Algorithm 1 Training Algorithm for PI-SAC
Open Source Code	Yes	Our implementation is given on Git Hub.1 https://github.com/google-research/pisac
Open Datasets	Yes	We evaluate PI-SAC on the Deep Mind control suite [42] and compare with leading model-free and model-based approaches for continuous control from pixels
Dataset Splits	No	The paper uses continuous control environments and does not specify traditional train/validation/test dataset splits with percentages or counts, as is common in supervised learning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using standard hyperparameters and architectures similar to other works, but does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	Throughout these experiments we mostly use the standard SAC hyperparameters [16], including the sizes of the actor and critic networks, learning rates, and target critic update rate. Unless otherwise speciﬁed, we set CEB β = 0.01. We report our results with the best number of gradient updates per environment step in Section 4.1, and use one gradient update per environment step for the rest of the experiments. Full details of hyperparameters are listed in Section A.2.