Adapting to Reward Progressivity via Spectral Reinforcement Learning
Authors: Michael Dann, John Thangarajah
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test whether this approach helps mitigate the impact of reward progressivity, we perform two set of experiments: First, we apply Spectral DQN to two domains that we specifically designed to exhibit strong reward progressivity. While previous methods struggle on these tasks, failing to learn almost anything at all in the more extreme domain, Spectral DQN performs markedly better, making considerable progress on both tasks. Next, we apply our approach to a less constructed set of tasks; namely, 6 standard Atari games. |
| Researcher Affiliation | Academia | School of Computing Technologies, RMIT University, Melbourne, Australia |
| Pseudocode | Yes | For clarity, the full pseudocode for this algorithm, which we term Spectral Q-learning, is provided in the Appendix (see Algorithm 1). |
| Open Source Code | Yes | The source code for our experiments is available at https://github.com/mchldann/ Spectral DQN. |
| Open Datasets | Yes | Next, we apply our approach to a less constructed set of tasks; namely, 6 standard Atari games. In the Atari domain, where the DQN algorithm was first evaluated, the agent receives a reward equal to the score increase at each frame. |
| Dataset Splits | No | Rather than conducting periodic evaluations of the agents with a small exploration constant, we simply report their training performance, since this arguably provides a more reliable measure of progress. All curves in the paper were averaged over 5 seeds. The paper describes reporting training performance averaged over seeds, but does not specify explicit train/validation/test dataset splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | Optimiser Adam (Table 3). The paper mentions the Adam optimizer but does not provide specific version numbers for it or any other software libraries or frameworks used. |
| Experiment Setup | Yes | Hyperparameters specific to Spectral DQN are listed in Table 2, while hyperparameters common to all agents are listed in Table 3. |