reproducibilityindex.ai

Understanding the Impact of Entropy on Policy Optimization

Authors: Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show experimentally that the difﬁculty of policy optimization is strongly linked to the geometry of the objective function. ... We show experimentally that policies with higher entropy induce a smoother objective that connects solutions and enable the use of larger learning rates. ... We conduct experiments in a setting where the optimization procedure has access to the exact gradient. ... Continuous control tasks from the Mu Jo Co simulator (Todorov et al., 2012; Brockman et al., 2016) facilitate studying the impact of entropy because we can parameterize policies using Gaussian distributions.
Researcher Affiliation	Collaboration	1Mila, Mc Gill University, Montr eal, Canada 2Work done while at Google Research 3Google Research 4University of Alberta. Correspondence to: Zafarali Ahmed <zafarali.ahmed@mail.mcgill.ca>.
Pseudocode	No	No pseudocode or algorithm blocks found.
Open Source Code	No	No explicit statement or link providing access to source code for the methodology described.
Open Datasets	Yes	We chose a 5 5 Gridworld with one suboptimal and one optimal reward at the corners (Figure 3). ... Continuous control tasks from the Mu Jo Co simulator (Todorov et al., 2012; Brockman et al., 2016) facilitate studying the impact of entropy because we can parameterize policies using Gaussian distributions.
Dataset Splits	No	No specific train/validation/test dataset splits (percentages or counts) are provided.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are provided.
Software Dependencies	No	The paper mentions 'Mu Jo Co simulator' but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup	No	In Hopper and Walker, the best learning rate increases consistently with entropy: The learning rate for σ = 1 is 10 times larger than for σ = 0.1. We use a large batch size to control for the variance reduction effects of a larger σ (Zhao et al., 2011). While learning rates are shown in Figure 5 legend, explicit numerical values or ranges for all hyperparameters (e.g., exact batch size, initial learning rates for all experiments) are not formally stated in text for setup.