Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

Authors: Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our analysis requires some assumptions, notably that the regularized greedy step is done without approximation. If this is reasonable with discrete actions and a linear parameterization, it does not hold when neural networks are considered. Given their prevalence today, we complement our thorough analysis with an extensive empirical study.
Researcher Affiliation Collaboration Nino Vieillard Google Research, Brain Team Université de Lorraine, CNRS, Inria Tadashi Kozunoú Okinawa Institute of Science and Technology Bruno Scherrer Université de Lorraine, CNRS, Inria Olivier Pietquin Google Research, Brain Team Deep Mind munos@google.com Matthieu Geist Google Research, Brain Team
Pseudocode No The paper describes schemes (1) and (2) using mathematical notation and text, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets Yes We consider two environments here (more are provided in Appx. E). The light Cartpole from Gym [14] allows for a large sweep over the parameters, and to average each result over 10 seeds. We also consider the Asterix Atari game [10], with sticky actions, to assess the effect of regularization on a large-scale problem.
Dataset Splits No The paper mentions that for DQN, "we fixed the meta-parameters to the best values for DQN" and for experiments, "The sweep over parameters is smaller, and each result is averaged over 3 seeds." However, it does not provide specific details on how training, validation, and test splits were performed (e.g., percentages, sample counts, or citations to standard split methodologies).
Hardware Specification No The paper mentions running an "extensive empirical study" in a "deep RL setting" involving neural networks, but it does not provide any specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for these experiments.
Software Dependencies No The paper refers to using DQN and its variants, but it does not list any specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x) that would be needed for replication.
Experiment Setup No The paper states that they "fixed the meta-parameters to the best values for DQN" and conducted a "large sweep over the parameters" for lambda and eta. However, it does not provide the concrete numerical values of hyperparameters (e.g., learning rate, batch size, optimizer settings) that define the experimental setup, nor does it refer to an appendix in the main text that lists these details.