reproducibilityindex.ai

Reinforcement Learning with Latent Flow

Authors: Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Misha Laskin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that Flare recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information. Flare is the most sample efﬁcient model-free pixel-based RL algorithm on the Deep Mind Control suite when evaluated on the 500k and 1M step benchmarks across 5 challenging control tasks, and, when used with Rainbow DQN, outperforms the competitive baseline on Atari games at 100M time step benchmark across 8 challenging games.
Researcher Affiliation	Collaboration	Wenling Shang Deep Mind wendyshang@deepmind.com, Xiaofei Wang UC Berkeley w.xf@berkeley.edu, Aravind Srinivas Open AI aravind_srinivas@berkeley.edu, Aravind Rajeswaran Facebook AI Research, University of Washington aravraj@fb.com, Yang Gao Tsinghua University gaoyangiiis@tsinghua.edu.cn, Pieter Abbeel UC Berkeley, Covariant pabbeel@berkeley.edu, Michael Laskin UC Berkeley mlaskin@berkeley.edu
Pseudocode	Yes	Pseudocode illustrates inference with Flare in Algorithm 5.2; during training, the encodings of latent features and ﬂow are done in the same way except with augmented observations.
Open Source Code	Yes	Code: https://github.com/Wendy Shang/ﬂare
Open Datasets	Yes	The Deep Mind Control Suite (DMControl) [42], based on Mu Jo Co [43], is a commonly used benchmark for continuous control from pixels.
Dataset Splits	No	The paper evaluates performance at '500K and 1M environment steps' for DMControl and '100M time step benchmark' for Atari, but does not explicitly define traditional training/validation/test dataset splits with percentages or sample counts, as is common for static datasets. Evaluation is based on interactions with environments.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance specifications used for running the experiments. It only generally mentions 'large computational requirements'.
Software Dependencies	No	The paper mentions software components and frameworks like SAC, RAD, Rainbow DQN, DMControl, MuJoCo, and DQN Zoo, but does not provide specific version numbers for any of them. For example, it mentions 'an ofﬁcial implementation of Rainbow [31]' but not its version.
Experiment Setup	Yes	We run 5 random seeds for both Flare and Rainbow DQN [19]... Furthermore, we use the same 5 seeds for both the baseline and Flare.