Energetic Natural Gradient Descent

Authors: Philip Thomas, Bruno Castro Silva, Christoph Dann, Emma Brunskill

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments provide empirical evidence that an agent optimizing its behavior via energetic natural gradient descent can execute more efficient update steps than one using the ordinary gradient or Fisher natural gradient. In particular, we show that, for a wide range of initial solutions (initial policy parameters), the energetic natural gradient consistently points towards better solutions than the ordinary gradient and Fisher natural gradient. ... Figure 4 shows, on the horizontal axis, the different initial policies from which we execute the different gradient updates; each initial policy was obtained by interpolating between a completely random policy and a policy with intermediate performance.
Researcher Affiliation Academia Philip S. Thomas PHILIPT@CS.CMU.EDU Bruno Castro da Silva BSILVA@INF.UFRGS.BR Christoph Dann CDANN@CS.CMU.EDU Emma Brunskill EBRUN@CS.CMU.EDU
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets Yes We selected a variant of the canonical mountain car domain (Thomas, 2015, Section 4.10.2).
Dataset Splits No The paper describes generating trajectories for experiments but does not provide specific dataset split information (e.g., percentages, sample counts for train/validation/test sets).
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for running the experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes Each gradient direction was computed by fixing the agent s policy, generating 20,000 trajectories, and then computing the gradient using REINFORCE, the sample FIM, and the sample EIM. ... We then compare the performance of the resulting policy after a single update. ... We also chose to use line searches to find the optimal step size for each method.