reproducibilityindex.ai

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

Authors: Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis requires some assumptions, notably that the regularized greedy step is done without approximation. If this is reasonable with discrete actions and a linear parameterization, it does not hold when neural networks are considered. Given their prevalence today, we complement our thorough analysis with an extensive empirical study.
Researcher Affiliation	Collaboration	Nino Vieillard Google Research, Brain Team Université de Lorraine, CNRS, Inria Tadashi Kozunoú Okinawa Institute of Science and Technology Bruno Scherrer Université de Lorraine, CNRS, Inria Olivier Pietquin Google Research, Brain Team Deep Mind munos@google.com Matthieu Geist Google Research, Brain Team
Pseudocode	No	The paper describes schemes (1) and (2) using mathematical notation and text, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets	Yes	We consider two environments here (more are provided in Appx. E). The light Cartpole from Gym [14] allows for a large sweep over the parameters, and to average each result over 10 seeds. We also consider the Asterix Atari game [10], with sticky actions, to assess the eﬀect of regularization on a large-scale problem.
Dataset Splits	No	The paper mentions that for DQN, "we ﬁxed the meta-parameters to the best values for DQN" and for experiments, "The sweep over parameters is smaller, and each result is averaged over 3 seeds." However, it does not provide specific details on how training, validation, and test splits were performed (e.g., percentages, sample counts, or citations to standard split methodologies).
Hardware Specification	No	The paper mentions running an "extensive empirical study" in a "deep RL setting" involving neural networks, but it does not provide any specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for these experiments.
Software Dependencies	No	The paper refers to using DQN and its variants, but it does not list any specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x) that would be needed for replication.
Experiment Setup	No	The paper states that they "ﬁxed the meta-parameters to the best values for DQN" and conducted a "large sweep over the parameters" for lambda and eta. However, it does not provide the concrete numerical values of hyperparameters (e.g., learning rate, batch size, optimizer settings) that define the experimental setup, nor does it refer to an appendix in the main text that lists these details.