Energetic Natural Gradient Descent
Authors: Philip Thomas, Bruno Castro Silva, Christoph Dann, Emma Brunskill
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments provide empirical evidence that an agent optimizing its behavior via energetic natural gradient descent can execute more efficient update steps than one using the ordinary gradient or Fisher natural gradient. In particular, we show that, for a wide range of initial solutions (initial policy parameters), the energetic natural gradient consistently points towards better solutions than the ordinary gradient and Fisher natural gradient. ... Figure 4 shows, on the horizontal axis, the different initial policies from which we execute the different gradient updates; each initial policy was obtained by interpolating between a completely random policy and a policy with intermediate performance. |
| Researcher Affiliation | Academia | Philip S. Thomas PHILIPT@CS.CMU.EDU Bruno Castro da Silva BSILVA@INF.UFRGS.BR Christoph Dann CDANN@CS.CMU.EDU Emma Brunskill EBRUN@CS.CMU.EDU |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | Yes | We selected a variant of the canonical mountain car domain (Thomas, 2015, Section 4.10.2). |
| Dataset Splits | No | The paper describes generating trajectories for experiments but does not provide specific dataset split information (e.g., percentages, sample counts for train/validation/test sets). |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | Each gradient direction was computed by fixing the agent s policy, generating 20,000 trajectories, and then computing the gradient using REINFORCE, the sample FIM, and the sample EIM. ... We then compare the performance of the resulting policy after a single update. ... We also chose to use line searches to find the optimal step size for each method. |