Orchestrated Value Mapping for Reinforcement Learning
Authors: Mehdi Fatemi, Arash Tavakoli
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, to illustrate the potential of the design space that our theory opens up, we instantiate a particular algorithm and evaluate its performance on the Atari suite. In this section, we illustrate the simplicity and utility of instantiating new learning methods based on our theory. ... We test this method in the Atari 2600 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) and compare its performance primarily against Log DQN and DQN (Mnih et al., 2015)... |
| Researcher Affiliation | Collaboration | Mehdi Fatemi Microsoft Research Montr eal, Canada mehdi.fatemi@microsoft.com Arash Tavakoli Max Planck Institute for Intelligent Systems T ubingen, Germany arash.tavakoli@tuebingen.mpg.de |
| Pseudocode | Yes | Algorithm 1: Orchestrated Value Mapping. |
| Open Source Code | Yes | The source code can be accessed at: https://github.com/microsoft/orchestrated-value-mapping. |
| Open Datasets | Yes | We test this method in the Atari 2600 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) |
| Dataset Splits | No | No explicit statement about training, validation, or test dataset splits (e.g., percentages or counts) was found, nor references to predefined splits with specific citations detailing these splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instance types) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions using the 'Dopamine framework (Castro et al., 2018)' but does not provide specific version numbers for Dopamine or any other software dependencies such as libraries or programming languages. |
| Experiment Setup | Yes | Notably, our Log Lin mapping hyperparameters are realized using the same values as those of Log DQN; i.e. c = 0.5 and d 0.02. We test this method in the Atari 2600 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) and compare its performance primarily against Log DQN and DQN (Mnih et al., 2015), denoted by Lin or (Lin)DQN to highlight that it corresponds to a linear mapping function with slope one. We also include two other major baselines for reference: C51 (Bellemare et al., 2017) and Rainbow (Hessel et al., 2018). Our tests are conducted on a stochastic version of Atari 2600 using sticky actions (Machado et al., 2018) and follow a uniļ¬ed evaluation protocol and codebase via the Dopamine framework (Castro et al., 2018). |