Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning
Authors: Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis requires some assumptions, notably that the regularized greedy step is done without approximation. If this is reasonable with discrete actions and a linear parameterization, it does not hold when neural networks are considered. Given their prevalence today, we complement our thorough analysis with an extensive empirical study. |
| Researcher Affiliation | Collaboration | Nino Vieillard Google Research, Brain Team Université de Lorraine, CNRS, Inria Tadashi Kozunoú Okinawa Institute of Science and Technology Bruno Scherrer Université de Lorraine, CNRS, Inria Olivier Pietquin Google Research, Brain Team Deep Mind munos@google.com Matthieu Geist Google Research, Brain Team |
| Pseudocode | No | The paper describes schemes (1) and (2) using mathematical notation and text, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | We consider two environments here (more are provided in Appx. E). The light Cartpole from Gym [14] allows for a large sweep over the parameters, and to average each result over 10 seeds. We also consider the Asterix Atari game [10], with sticky actions, to assess the effect of regularization on a large-scale problem. |
| Dataset Splits | No | The paper mentions that for DQN, "we fixed the meta-parameters to the best values for DQN" and for experiments, "The sweep over parameters is smaller, and each result is averaged over 3 seeds." However, it does not provide specific details on how training, validation, and test splits were performed (e.g., percentages, sample counts, or citations to standard split methodologies). |
| Hardware Specification | No | The paper mentions running an "extensive empirical study" in a "deep RL setting" involving neural networks, but it does not provide any specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for these experiments. |
| Software Dependencies | No | The paper refers to using DQN and its variants, but it does not list any specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x) that would be needed for replication. |
| Experiment Setup | No | The paper states that they "fixed the meta-parameters to the best values for DQN" and conducted a "large sweep over the parameters" for lambda and eta. However, it does not provide the concrete numerical values of hyperparameters (e.g., learning rate, batch size, optimizer settings) that define the experimental setup, nor does it refer to an appendix in the main text that lists these details. |