General Munchausen Reinforcement Learning with Tsallis Kullback-Leibler Divergence
Authors: Lingwei Zhu, Zheng Chen, Matthew Schlegel, Martha White
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this generalized MVI(q) obtains significant improvements over the standard MVI(q = 1) across 35 Atari games. [...] We compare MVI(q = 2) with MVI (namely the standard choice where q = 1), and find that we obtain significant performance improvements in Atari. [...] In this section we investigate the utility of MVI(q) in the Atari 2600 benchmark [Bellemare et al., 2013]. [...] Figure 4: Learning curves of MVI(q) and M-VI on the selected Atari games, averaged over 3 independent runs, with ribbon denoting the standard error. |
| Researcher Affiliation | Academia | Lingwei Zhu University of Alberta lingwei4@ualberta.ca Zheng Chen Osaka University chenz@sanken.osaka-u.ac.jp Matthew Schlegel University of Alberta mkschleg@ualberta.ca Martha White University of Alberta CIFAR Canada AI Chair, Amii whitem@ualberta.ca |
| Pseudocode | Yes | Algorithm 1: MVI(q) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | In this section we investigate the utility of MVI(q) in the Atari 2600 benchmark [Bellemare et al., 2013]. |
| Dataset Splits | No | The paper mentions evaluating on Atari games and performing grid searches for hyperparameters but does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) needed for data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for conducting the experiments. |
| Software Dependencies | No | The paper mentions using the 'optimized Stable-Baselines3 architecture [Raffin et al., 2021]' but does not provide specific version numbers for Stable-Baselines3 or any other software dependencies. |
| Experiment Setup | Yes | We perform grid searches for the algorithmic hyperparameters on two environments Asterix and Seaquest: the latter environment is regarded as a hard exploration environment. MVI(q) α : {0.01, 0.1, 0.5, 0.9, 0.99}; τ : {0.01, 0.1, 1.0, 10, 100}. Tsallis-VI τ : {0.01, 0.1, 1.0, 10, 100}. [...] Table 1: Parameters used for Gym. [...] Table 2: Parameters used for Atari games. |