Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The Multiquadric Kernel for Moment-Matching Distributional Reinforcement Learning
Authors: Ludvig Killingberg, Helge Langseth
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our contribution is mainly of a theoretical nature, presenting the first formally sound kernel for moment-matching distributional reinforcement learning with good practical performance. We also provide insights into why the RBF kernel has been shown to provide good practical results despite its theoretical problems. Finally, we evaluate the performance of our kernel on a number of standard benchmarks, obtaining results comparable to the state-of-the-art. ... To evaluate the performance of the multiquadric kernel, we conducted experiments on 8 Atari games from the Arcade Learning Environment (ALE). |
| Researcher Affiliation | Academia | Ludvig Killingberg EMAIL Department of Computer Science Norwegian University of Science and Technology Helge Langseth EMAIL Department of Computer Science Norwegian University of Science and Technology CIFAR Fellow |
| Pseudocode | Yes | Algorithm 1 Procedure for evaluating the ability of MMD with kernel k to approximate a distribution. Require: k, P, {x}1:N, T AD-statistics Array[1 : T] for t = 1 : T do yi P for i = 1, . . . , N { x}1:N x MMDb ({x}1:N, {y}1:N; k) {x}1:N Optimiser({x}1:N, { x}1:N) AD-statistics[t] Anderson-Darling({x}1:N, P) end for return AD-statistics |
| Open Source Code | No | Raw result data for MQ is available at https://github.com/ludvigk/MQ-MMDRL. The numbers for RBF and QRDQN are taken from Nguyen-Tang et al. (2021), and are available along with the implementation of MMDQN at https://github.com/thanhnguyentang/mmdrl. Explanation: The paper states that "Raw result data for MQ is available" at a GitHub link, but does not explicitly state that the source code for the methodology described in this paper is provided. The second link refers to the implementation of a baseline method, not the authors' own code for their contribution. |
| Open Datasets | Yes | To evaluate the performance of the multiquadric kernel, we conducted experiments on 8 Atari games from the Arcade Learning Environment (ALE). |
| Dataset Splits | No | To evaluate the performance of the multiquadric kernel, we conducted experiments on 8 Atari games from the Arcade Learning Environment (ALE). ... Training curves for QR-DQN and MMDQN with the MQ and RBF on 8 Atari 2600 games. Curves for MMDQN are averaged over 3 seeds and smoothed over a sliding window of 5 iterations. QR-DQN is averaged over 2 seeds. Explanation: The paper mentions using "8 Atari games from the Arcade Learning Environment (ALE)" and averaging results over multiple seeds, but does not specify any explicit training, validation, or test dataset splits for these environments. The concept of dataset splits in reinforcement learning on game environments typically differs from supervised learning, and no such explicit splits were detailed for reproduction. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or memory) are provided in the paper for running the experiments. |
| Software Dependencies | No | No specific software dependencies, libraries, or frameworks with their version numbers are mentioned in the paper. |
| Experiment Setup | Yes | A learning rate of 0.1 was used for MQ and a learning rate of 0.01 was used for RBF. ... To ensure that we do not involuntarily skew the experiments in favor of the multiquadric kernel, we used the original implementation of MMDQN by Nguyen-Tang et al. (2021), and all except the kernel parameters remained the same. |