Learning Retrospective Knowledge with Reverse Reinforcement Learning
Authors: Shangtong Zhang, Vivek Veeriah, Shimon Whiteson
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically the utility of Reverse GVFs in both representation learning and anomaly detection. |
| Researcher Affiliation | Academia | Shangtong Zhang University of Oxford Vivek Veeriah University of Michigan, Ann Arbor Shimon Whiteson University of Oxford |
| Pseudocode | No | The paper provides mathematical equations for its algorithms (e.g., Eq (1) and Eq (3) for Reverse TD and Off-policy Reverse TD, respectively) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at https://github.com/Shangtong Zhang/Deep RL |
| Open Datasets | Yes | We now consider Reacher from Open AI gym (Brockman et al., 2016) and use neural networks as a function approximator for qi(s; θ). [...] We benchmark our IMPALA+Reverse GVF agent against a plain IMPALA agent, an IMPALA+Reward Prediction agent, an IMPALA+Pixel Control agent, and an IMPALA+GVF agent in ten Atari games. |
| Dataset Splits | No | The paper describes training and evaluation phases for its experiments but does not explicitly provide specific training, validation, and test dataset splits with percentages, sample counts, or citations to predefined splits. |
| Hardware Specification | No | The experiments were made possible by a generous equipment grant from NVIDIA. No specific GPU models, CPU models, or detailed hardware configurations are provided beyond this general statement. |
| Software Dependencies | No | The paper mentions several software components and algorithms, such as 'Open AI gym', 'TD3', and 'IMPALA', but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For each λ, we use a constant step size α tuned from {10^-3, 5 × 10^-3, 10^-2, 5 × 10^-2}. [...] We approximate ηs π with N = 20 quantiles for all s. [...] We use = 1 in our experiments. [...] we use σ = 1 in our experiments. |