Orchestrated Value Mapping for Reinforcement Learning

Authors: Mehdi Fatemi, Arash Tavakoli

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, to illustrate the potential of the design space that our theory opens up, we instantiate a particular algorithm and evaluate its performance on the Atari suite. In this section, we illustrate the simplicity and utility of instantiating new learning methods based on our theory. ... We test this method in the Atari 2600 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) and compare its performance primarily against Log DQN and DQN (Mnih et al., 2015)...
Researcher Affiliation Collaboration Mehdi Fatemi Microsoft Research Montr eal, Canada mehdi.fatemi@microsoft.com Arash Tavakoli Max Planck Institute for Intelligent Systems T ubingen, Germany arash.tavakoli@tuebingen.mpg.de
Pseudocode Yes Algorithm 1: Orchestrated Value Mapping.
Open Source Code Yes The source code can be accessed at: https://github.com/microsoft/orchestrated-value-mapping.
Open Datasets Yes We test this method in the Atari 2600 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013)
Dataset Splits No No explicit statement about training, validation, or test dataset splits (e.g., percentages or counts) was found, nor references to predefined splits with specific citations detailing these splits.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions using the 'Dopamine framework (Castro et al., 2018)' but does not provide specific version numbers for Dopamine or any other software dependencies such as libraries or programming languages.
Experiment Setup Yes Notably, our Log Lin mapping hyperparameters are realized using the same values as those of Log DQN; i.e. c = 0.5 and d 0.02. We test this method in the Atari 2600 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) and compare its performance primarily against Log DQN and DQN (Mnih et al., 2015), denoted by Lin or (Lin)DQN to highlight that it corresponds to a linear mapping function with slope one. We also include two other major baselines for reference: C51 (Bellemare et al., 2017) and Rainbow (Hessel et al., 2018). Our tests are conducted on a stochastic version of Atari 2600 using sticky actions (Machado et al., 2018) and follow a unified evaluation protocol and codebase via the Dopamine framework (Castro et al., 2018).