VIREL: A Variational Inference Framework for Reinforcement Learning

Authors: Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate two algorithms derived from our framework against DDPG [38] and an existing state-of-the-art actor-critic algorithm, soft actor-critic (SAC) [25], on a variety of Open AI gym domains [9]. While our algorithms perform similarly to SAC and DDPG on simple low dimensional tasks, they outperform them substantially on complex, high dimensional tasks.
Researcher Affiliation Academia Matthew Fellows Anuj Mahajan Tim G. J. Rudner Shimon Whiteson Department of Computer Science University of Oxford
Pseudocode Yes Pseudocode can be found in Appendix H.
Open Source Code No The paper mentions using implementations provided by other authors for baselines but does not provide or explicitly state the availability of open-source code for their own proposed methodology.
Open Datasets Yes We compare our methods to the state-of-the-art SAC2 and DDPG [38] algorithms on Mu Jo Co tasks in Open AI gym [9] and in rllab [14].
Dataset Splits No The paper refers to continuous control benchmarks from OpenAI Gym and rllab but does not specify custom training, validation, or testing splits or percentages, which is common in reinforcement learning where interaction with an environment substitutes for static dataset splits.
Hardware Specification No The paper mentions that 'The experiments were made possible by a generous equipment grant from NVIDIA' but does not specify any particular GPU models, CPU models, memory, or other detailed hardware specifications used for the experiments.
Software Dependencies No The paper mentions using implementations provided by the authors of other works (e.g., SAC, rllab) but does not provide specific version numbers for software dependencies such as Python, deep learning frameworks, or other libraries used in their own experimental setup.
Experiment Setup Yes All experiments use 5 random initialisations and parameter values are given in Appendix I.1.