reproducibilityindex.ai

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Authors: Felix Leibfried, Sergio Pascual-Díaz, Jordi Grau-Moya

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then apply this idea to develop off-policy actor-critic RL algorithms which we validate in high-dimensional continuous robotics domains (Mu Jo Co). Our methods demonstrate improved initial and competitive ﬁnal performance compared to model-free state-of-the-art techniques.
Researcher Affiliation	Industry	Felix Leibfried, Sergio Pascual-Díaz, Jordi Grau-Moya PROWLER.io Cambridge, UK {felix,sergio.diaz,jordi}@prowler.io
Pseudocode	Yes	see Appendix B for pseudocode.
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets	Yes	We validate EAC and ACIE in the robotics simulator Mu Jo Co [66, 8] with deep neural nets under the same setup for each experiment following [67, 25, 50, 24, 56, 36, 68, 57, 1, 9, 15, 21] see Appendix C.2 for details. While EAC is a standalone algorithm, ACIE can be combined with any RL algorithm (we use the model-free state of the art SAC [21]). We compare against DDPG [36] and PPO [57] from RLlib [35] as well as SAC on the Mu Jo Co v2-environments (ten seeds per run [47]).
Dataset Splits	No	The paper refers to training, validation, and test in the context of reinforcement learning concepts (e.g., 'validation' in value function learning) but does not provide specific dataset splits (e.g., percentages or counts) for experimental evaluation.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'Mu Jo Co [66, 8]' and 'RLlib [35]' but does not specify version numbers for these or other software dependencies.
Experiment Setup	No	The paper states 'under the same setup for each experiment following [67, 25, 50, 24, 56, 36, 68, 57, 1, 9, 15, 21] see Appendix C.2 for details.' While details are referenced in an appendix, the main text itself does not contain specific experimental setup details like hyperparameter values or training configurations.