Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Reinforcement Learning with Latent Flow
Authors: Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Misha Laskin
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Flare recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information. Flare is the most sample efficient model-free pixel-based RL algorithm on the Deep Mind Control suite when evaluated on the 500k and 1M step benchmarks across 5 challenging control tasks, and, when used with Rainbow DQN, outperforms the competitive baseline on Atari games at 100M time step benchmark across 8 challenging games. |
| Researcher Affiliation | Collaboration | Wenling Shang Deep Mind EMAIL, Xiaofei Wang UC Berkeley EMAIL, Aravind Srinivas Open AI EMAIL, Aravind Rajeswaran Facebook AI Research, University of Washington EMAIL, Yang Gao Tsinghua University EMAIL, Pieter Abbeel UC Berkeley, Covariant EMAIL, Michael Laskin UC Berkeley EMAIL |
| Pseudocode | Yes | Pseudocode illustrates inference with Flare in Algorithm 5.2; during training, the encodings of latent features and flow are done in the same way except with augmented observations. |
| Open Source Code | Yes | Code: https://github.com/Wendy Shang/flare |
| Open Datasets | Yes | The Deep Mind Control Suite (DMControl) [42], based on Mu Jo Co [43], is a commonly used benchmark for continuous control from pixels. |
| Dataset Splits | No | The paper evaluates performance at '500K and 1M environment steps' for DMControl and '100M time step benchmark' for Atari, but does not explicitly define traditional training/validation/test dataset splits with percentages or sample counts, as is common for static datasets. Evaluation is based on interactions with environments. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance specifications used for running the experiments. It only generally mentions 'large computational requirements'. |
| Software Dependencies | No | The paper mentions software components and frameworks like SAC, RAD, Rainbow DQN, DMControl, MuJoCo, and DQN Zoo, but does not provide specific version numbers for any of them. For example, it mentions 'an official implementation of Rainbow [31]' but not its version. |
| Experiment Setup | Yes | We run 5 random seeds for both Flare and Rainbow DQN [19]... Furthermore, we use the same 5 seeds for both the baseline and Flare. |