Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

Authors: Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the performance of our method, we conducted experiments on the Mu Jo Co benchmark suite and a number of high-dimensional robotic tasks simulated by Omniverse Isaac Gym. The evaluation results demonstrate that our method achieves superior performance compared to widely-adopted representative baselines.
Researcher Affiliation Collaboration 1 Elsa Lab, National Tsing Hua University, Hsinchu City, Taiwan 2 NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, CA, USA
Pseudocode Yes Algorithm 1 Pseudo Code of the Training Process of MEow
Open Source Code Yes The code is implemented using Py Torch [72] and is available in the following repository: https: //github.com/Chien Feng-hub/meow.
Open Datasets Yes We conducted experiments on the Mu Jo Co benchmark suite [32, 33] and a number of high-dimensional robotic tasks simulated by Omniverse Isaac Gym [34].
Dataset Splits No The paper describes training processes and evaluation on reinforcement learning environments (Mu Jo Co, Omniverse Isaac Gym) which involve continuous interaction rather than predefined dataset splits for train/validation/test.
Hardware Specification Yes The computation was carried out on NVIDIA TITAN V GPUs equipped with 12GB of memory (Multi-Goal), NVIDIA V100 GPUs equipped with 16GB of memory (Mu Jo Co), and NVIDIA L40 GPUs equipped with 48GB of memory (Omniverse Isaac Gym).
Software Dependencies No The paper mentions software like PyTorch [72], Clean RL [76], and SKRL [79] for implementation but does not specify their version numbers.
Experiment Setup Yes The shared and the environment-specific hyperparameters of MEow are summarized in Tables A1 and A3, respectively. For example, Table A1 lists 'optimizer Adam [78] learning rate (β) 0.001 gradient clip value 30 discount (γ) 0.99 buffer size 10^6'.