Reinforcement Learning with Latent Flow

Authors: Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Misha Laskin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that Flare recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information. Flare is the most sample efficient model-free pixel-based RL algorithm on the Deep Mind Control suite when evaluated on the 500k and 1M step benchmarks across 5 challenging control tasks, and, when used with Rainbow DQN, outperforms the competitive baseline on Atari games at 100M time step benchmark across 8 challenging games.
Researcher Affiliation Collaboration Wenling Shang Deep Mind wendyshang@deepmind.com, Xiaofei Wang UC Berkeley w.xf@berkeley.edu, Aravind Srinivas Open AI aravind_srinivas@berkeley.edu, Aravind Rajeswaran Facebook AI Research, University of Washington aravraj@fb.com, Yang Gao Tsinghua University gaoyangiiis@tsinghua.edu.cn, Pieter Abbeel UC Berkeley, Covariant pabbeel@berkeley.edu, Michael Laskin UC Berkeley mlaskin@berkeley.edu
Pseudocode Yes Pseudocode illustrates inference with Flare in Algorithm 5.2; during training, the encodings of latent features and flow are done in the same way except with augmented observations.
Open Source Code Yes Code: https://github.com/Wendy Shang/flare
Open Datasets Yes The Deep Mind Control Suite (DMControl) [42], based on Mu Jo Co [43], is a commonly used benchmark for continuous control from pixels.
Dataset Splits No The paper evaluates performance at '500K and 1M environment steps' for DMControl and '100M time step benchmark' for Atari, but does not explicitly define traditional training/validation/test dataset splits with percentages or sample counts, as is common for static datasets. Evaluation is based on interactions with environments.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance specifications used for running the experiments. It only generally mentions 'large computational requirements'.
Software Dependencies No The paper mentions software components and frameworks like SAC, RAD, Rainbow DQN, DMControl, MuJoCo, and DQN Zoo, but does not provide specific version numbers for any of them. For example, it mentions 'an official implementation of Rainbow [31]' but not its version.
Experiment Setup Yes We run 5 random seeds for both Flare and Rainbow DQN [19]... Furthermore, we use the same 5 seeds for both the baseline and Flare.