reproducibilityindex.ai

Adapting to Reward Progressivity via Spectral Reinforcement Learning

Authors: Michael Dann, John Thangarajah

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To test whether this approach helps mitigate the impact of reward progressivity, we perform two set of experiments: First, we apply Spectral DQN to two domains that we speciﬁcally designed to exhibit strong reward progressivity. While previous methods struggle on these tasks, failing to learn almost anything at all in the more extreme domain, Spectral DQN performs markedly better, making considerable progress on both tasks. Next, we apply our approach to a less constructed set of tasks; namely, 6 standard Atari games.
Researcher Affiliation	Academia	School of Computing Technologies, RMIT University, Melbourne, Australia
Pseudocode	Yes	For clarity, the full pseudocode for this algorithm, which we term Spectral Q-learning, is provided in the Appendix (see Algorithm 1).
Open Source Code	Yes	The source code for our experiments is available at https://github.com/mchldann/ Spectral DQN.
Open Datasets	Yes	Next, we apply our approach to a less constructed set of tasks; namely, 6 standard Atari games. In the Atari domain, where the DQN algorithm was ﬁrst evaluated, the agent receives a reward equal to the score increase at each frame.
Dataset Splits	No	Rather than conducting periodic evaluations of the agents with a small exploration constant, we simply report their training performance, since this arguably provides a more reliable measure of progress. All curves in the paper were averaged over 5 seeds. The paper describes reporting training performance averaged over seeds, but does not specify explicit train/validation/test dataset splits (e.g., percentages or counts).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	Optimiser Adam (Table 3). The paper mentions the Adam optimizer but does not provide specific version numbers for it or any other software libraries or frameworks used.
Experiment Setup	Yes	Hyperparameters speciﬁc to Spectral DQN are listed in Table 2, while hyperparameters common to all agents are listed in Table 3.