reproducibilityindex.ai

Orchestrated Value Mapping for Reinforcement Learning

Authors: Mehdi Fatemi, Arash Tavakoli

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, to illustrate the potential of the design space that our theory opens up, we instantiate a particular algorithm and evaluate its performance on the Atari suite. In this section, we illustrate the simplicity and utility of instantiating new learning methods based on our theory. ... We test this method in the Atari 2600 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) and compare its performance primarily against Log DQN and DQN (Mnih et al., 2015)...
Researcher Affiliation	Collaboration	Mehdi Fatemi Microsoft Research Montr eal, Canada mehdi.fatemi@microsoft.com Arash Tavakoli Max Planck Institute for Intelligent Systems T ubingen, Germany arash.tavakoli@tuebingen.mpg.de
Pseudocode	Yes	Algorithm 1: Orchestrated Value Mapping.
Open Source Code	Yes	The source code can be accessed at: https://github.com/microsoft/orchestrated-value-mapping.
Open Datasets	Yes	We test this method in the Atari 2600 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013)
Dataset Splits	No	No explicit statement about training, validation, or test dataset splits (e.g., percentages or counts) was found, nor references to predefined splits with specific citations detailing these splits.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions using the 'Dopamine framework (Castro et al., 2018)' but does not provide specific version numbers for Dopamine or any other software dependencies such as libraries or programming languages.
Experiment Setup	Yes	Notably, our Log Lin mapping hyperparameters are realized using the same values as those of Log DQN; i.e. c = 0.5 and d 0.02. We test this method in the Atari 2600 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) and compare its performance primarily against Log DQN and DQN (Mnih et al., 2015), denoted by Lin or (Lin)DQN to highlight that it corresponds to a linear mapping function with slope one. We also include two other major baselines for reference: C51 (Bellemare et al., 2017) and Rainbow (Hessel et al., 2018). Our tests are conducted on a stochastic version of Atari 2600 using sticky actions (Machado et al., 2018) and follow a uniﬁed evaluation protocol and codebase via the Dopamine framework (Castro et al., 2018).