Learning Retrospective Knowledge with Reverse Reinforcement Learning

Authors: Shangtong Zhang, Vivek Veeriah, Shimon Whiteson

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically the utility of Reverse GVFs in both representation learning and anomaly detection.
Researcher Affiliation Academia Shangtong Zhang University of Oxford Vivek Veeriah University of Michigan, Ann Arbor Shimon Whiteson University of Oxford
Pseudocode No The paper provides mathematical equations for its algorithms (e.g., Eq (1) and Eq (3) for Reverse TD and Off-policy Reverse TD, respectively) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code available at https://github.com/Shangtong Zhang/Deep RL
Open Datasets Yes We now consider Reacher from Open AI gym (Brockman et al., 2016) and use neural networks as a function approximator for qi(s; θ). [...] We benchmark our IMPALA+Reverse GVF agent against a plain IMPALA agent, an IMPALA+Reward Prediction agent, an IMPALA+Pixel Control agent, and an IMPALA+GVF agent in ten Atari games.
Dataset Splits No The paper describes training and evaluation phases for its experiments but does not explicitly provide specific training, validation, and test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification No The experiments were made possible by a generous equipment grant from NVIDIA. No specific GPU models, CPU models, or detailed hardware configurations are provided beyond this general statement.
Software Dependencies No The paper mentions several software components and algorithms, such as 'Open AI gym', 'TD3', and 'IMPALA', but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For each λ, we use a constant step size α tuned from {10^-3, 5 × 10^-3, 10^-2, 5 × 10^-2}. [...] We approximate ηs π with N = 20 quantiles for all s. [...] We use = 1 in our experiments. [...] we use σ = 1 in our experiments.