Predictive Information Accelerates Learning in RL
Authors: Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John Canny, Sergio Guadarrama
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels. |
| Researcher Affiliation | Collaboration | Kuang-Huei Lee Google Research leekh@google.com Ian Fischer Google Research iansf@google.com Anthony Z. Liu University of Michigan anthliu@umich.edu Yijie Guo University of Michigan guoyijie@umich.edu Honglak Lee Google Research honglak@google.com John Canny Google Research canny@google.com Sergio Guadarrama Google Research sguada@google.com |
| Pseudocode | Yes | Algorithm 1 Training Algorithm for PI-SAC |
| Open Source Code | Yes | Our implementation is given on Git Hub.1 https://github.com/google-research/pisac |
| Open Datasets | Yes | We evaluate PI-SAC on the Deep Mind control suite [42] and compare with leading model-free and model-based approaches for continuous control from pixels |
| Dataset Splits | No | The paper uses continuous control environments and does not specify traditional train/validation/test dataset splits with percentages or counts, as is common in supervised learning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using standard hyperparameters and architectures similar to other works, but does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | Throughout these experiments we mostly use the standard SAC hyperparameters [16], including the sizes of the actor and critic networks, learning rates, and target critic update rate. Unless otherwise specified, we set CEB β = 0.01. We report our results with the best number of gradient updates per environment step in Section 4.1, and use one gradient update per environment step for the rest of the experiments. Full details of hyperparameters are listed in Section A.2. |