A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Authors: Felix Leibfried, Sergio Pascual-Díaz, Jordi Grau-Moya

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then apply this idea to develop off-policy actor-critic RL algorithms which we validate in high-dimensional continuous robotics domains (Mu Jo Co). Our methods demonstrate improved initial and competitive final performance compared to model-free state-of-the-art techniques.
Researcher Affiliation Industry Felix Leibfried, Sergio Pascual-Díaz, Jordi Grau-Moya PROWLER.io Cambridge, UK {felix,sergio.diaz,jordi}@prowler.io
Pseudocode Yes see Appendix B for pseudocode.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes We validate EAC and ACIE in the robotics simulator Mu Jo Co [66, 8] with deep neural nets under the same setup for each experiment following [67, 25, 50, 24, 56, 36, 68, 57, 1, 9, 15, 21] see Appendix C.2 for details. While EAC is a standalone algorithm, ACIE can be combined with any RL algorithm (we use the model-free state of the art SAC [21]). We compare against DDPG [36] and PPO [57] from RLlib [35] as well as SAC on the Mu Jo Co v2-environments (ten seeds per run [47]).
Dataset Splits No The paper refers to training, validation, and test in the context of reinforcement learning concepts (e.g., 'validation' in value function learning) but does not provide specific dataset splits (e.g., percentages or counts) for experimental evaluation.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Mu Jo Co [66, 8]' and 'RLlib [35]' but does not specify version numbers for these or other software dependencies.
Experiment Setup No The paper states 'under the same setup for each experiment following [67, 25, 50, 24, 56, 36, 68, 57, 1, 9, 15, 21] see Appendix C.2 for details.' While details are referenced in an appendix, the main text itself does not contain specific experimental setup details like hyperparameter values or training configurations.