Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
Authors: Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the performance of our method, we conducted experiments on the Mu Jo Co benchmark suite and a number of high-dimensional robotic tasks simulated by Omniverse Isaac Gym. The evaluation results demonstrate that our method achieves superior performance compared to widely-adopted representative baselines. |
| Researcher Affiliation | Collaboration | 1 Elsa Lab, National Tsing Hua University, Hsinchu City, Taiwan 2 NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, CA, USA |
| Pseudocode | Yes | Algorithm 1 Pseudo Code of the Training Process of MEow |
| Open Source Code | Yes | The code is implemented using Py Torch [72] and is available in the following repository: https: //github.com/Chien Feng-hub/meow. |
| Open Datasets | Yes | We conducted experiments on the Mu Jo Co benchmark suite [32, 33] and a number of high-dimensional robotic tasks simulated by Omniverse Isaac Gym [34]. |
| Dataset Splits | No | The paper describes training processes and evaluation on reinforcement learning environments (Mu Jo Co, Omniverse Isaac Gym) which involve continuous interaction rather than predefined dataset splits for train/validation/test. |
| Hardware Specification | Yes | The computation was carried out on NVIDIA TITAN V GPUs equipped with 12GB of memory (Multi-Goal), NVIDIA V100 GPUs equipped with 16GB of memory (Mu Jo Co), and NVIDIA L40 GPUs equipped with 48GB of memory (Omniverse Isaac Gym). |
| Software Dependencies | No | The paper mentions software like PyTorch [72], Clean RL [76], and SKRL [79] for implementation but does not specify their version numbers. |
| Experiment Setup | Yes | The shared and the environment-specific hyperparameters of MEow are summarized in Tables A1 and A3, respectively. For example, Table A1 lists 'optimizer Adam [78] learning rate (β) 0.001 gradient clip value 30 discount (γ) 0.99 buffer size 10^6'. |