VIREL: A Variational Inference Framework for Reinforcement Learning
Authors: Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate two algorithms derived from our framework against DDPG [38] and an existing state-of-the-art actor-critic algorithm, soft actor-critic (SAC) [25], on a variety of Open AI gym domains [9]. While our algorithms perform similarly to SAC and DDPG on simple low dimensional tasks, they outperform them substantially on complex, high dimensional tasks. |
| Researcher Affiliation | Academia | Matthew Fellows Anuj Mahajan Tim G. J. Rudner Shimon Whiteson Department of Computer Science University of Oxford |
| Pseudocode | Yes | Pseudocode can be found in Appendix H. |
| Open Source Code | No | The paper mentions using implementations provided by other authors for baselines but does not provide or explicitly state the availability of open-source code for their own proposed methodology. |
| Open Datasets | Yes | We compare our methods to the state-of-the-art SAC2 and DDPG [38] algorithms on Mu Jo Co tasks in Open AI gym [9] and in rllab [14]. |
| Dataset Splits | No | The paper refers to continuous control benchmarks from OpenAI Gym and rllab but does not specify custom training, validation, or testing splits or percentages, which is common in reinforcement learning where interaction with an environment substitutes for static dataset splits. |
| Hardware Specification | No | The paper mentions that 'The experiments were made possible by a generous equipment grant from NVIDIA' but does not specify any particular GPU models, CPU models, memory, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using implementations provided by the authors of other works (e.g., SAC, rllab) but does not provide specific version numbers for software dependencies such as Python, deep learning frameworks, or other libraries used in their own experimental setup. |
| Experiment Setup | Yes | All experiments use 5 random initialisations and parameter values are given in Appendix I.1. |