reproducibilityindex.ai

Learning Retrospective Knowledge with Reverse Reinforcement Learning

Authors: Shangtong Zhang, Vivek Veeriah, Shimon Whiteson

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate empirically the utility of Reverse GVFs in both representation learning and anomaly detection.
Researcher Affiliation	Academia	Shangtong Zhang University of Oxford Vivek Veeriah University of Michigan, Ann Arbor Shimon Whiteson University of Oxford
Pseudocode	No	The paper provides mathematical equations for its algorithms (e.g., Eq (1) and Eq (3) for Reverse TD and Off-policy Reverse TD, respectively) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at https://github.com/Shangtong Zhang/Deep RL
Open Datasets	Yes	We now consider Reacher from Open AI gym (Brockman et al., 2016) and use neural networks as a function approximator for qi(s; θ). [...] We benchmark our IMPALA+Reverse GVF agent against a plain IMPALA agent, an IMPALA+Reward Prediction agent, an IMPALA+Pixel Control agent, and an IMPALA+GVF agent in ten Atari games.
Dataset Splits	No	The paper describes training and evaluation phases for its experiments but does not explicitly provide specific training, validation, and test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification	No	The experiments were made possible by a generous equipment grant from NVIDIA. No specific GPU models, CPU models, or detailed hardware configurations are provided beyond this general statement.
Software Dependencies	No	The paper mentions several software components and algorithms, such as 'Open AI gym', 'TD3', and 'IMPALA', but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	For each λ, we use a constant step size α tuned from {10^-3, 5 × 10^-3, 10^-2, 5 × 10^-2}. [...] We approximate ηs π with N = 20 quantiles for all s. [...] We use = 1 in our experiments. [...] we use σ = 1 in our experiments.