A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment
Authors: Felix Leibfried, Sergio Pascual-Díaz, Jordi Grau-Moya
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then apply this idea to develop off-policy actor-critic RL algorithms which we validate in high-dimensional continuous robotics domains (Mu Jo Co). Our methods demonstrate improved initial and competitive final performance compared to model-free state-of-the-art techniques. |
| Researcher Affiliation | Industry | Felix Leibfried, Sergio Pascual-Díaz, Jordi Grau-Moya PROWLER.io Cambridge, UK {felix,sergio.diaz,jordi}@prowler.io |
| Pseudocode | Yes | see Appendix B for pseudocode. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | We validate EAC and ACIE in the robotics simulator Mu Jo Co [66, 8] with deep neural nets under the same setup for each experiment following [67, 25, 50, 24, 56, 36, 68, 57, 1, 9, 15, 21] see Appendix C.2 for details. While EAC is a standalone algorithm, ACIE can be combined with any RL algorithm (we use the model-free state of the art SAC [21]). We compare against DDPG [36] and PPO [57] from RLlib [35] as well as SAC on the Mu Jo Co v2-environments (ten seeds per run [47]). |
| Dataset Splits | No | The paper refers to training, validation, and test in the context of reinforcement learning concepts (e.g., 'validation' in value function learning) but does not provide specific dataset splits (e.g., percentages or counts) for experimental evaluation. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Mu Jo Co [66, 8]' and 'RLlib [35]' but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | No | The paper states 'under the same setup for each experiment following [67, 25, 50, 24, 56, 36, 68, 57, 1, 9, 15, 21] see Appendix C.2 for details.' While details are referenced in an appendix, the main text itself does not contain specific experimental setup details like hyperparameter values or training configurations. |