reproducibilityindex.ai

Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

Authors: Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the performance of our method, we conducted experiments on the Mu Jo Co benchmark suite and a number of high-dimensional robotic tasks simulated by Omniverse Isaac Gym. The evaluation results demonstrate that our method achieves superior performance compared to widely-adopted representative baselines.
Researcher Affiliation	Collaboration	1 Elsa Lab, National Tsing Hua University, Hsinchu City, Taiwan 2 NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, CA, USA
Pseudocode	Yes	Algorithm 1 Pseudo Code of the Training Process of MEow
Open Source Code	Yes	The code is implemented using Py Torch [72] and is available in the following repository: https: //github.com/Chien Feng-hub/meow.
Open Datasets	Yes	We conducted experiments on the Mu Jo Co benchmark suite [32, 33] and a number of high-dimensional robotic tasks simulated by Omniverse Isaac Gym [34].
Dataset Splits	No	The paper describes training processes and evaluation on reinforcement learning environments (Mu Jo Co, Omniverse Isaac Gym) which involve continuous interaction rather than predefined dataset splits for train/validation/test.
Hardware Specification	Yes	The computation was carried out on NVIDIA TITAN V GPUs equipped with 12GB of memory (Multi-Goal), NVIDIA V100 GPUs equipped with 16GB of memory (Mu Jo Co), and NVIDIA L40 GPUs equipped with 48GB of memory (Omniverse Isaac Gym).
Software Dependencies	No	The paper mentions software like PyTorch [72], Clean RL [76], and SKRL [79] for implementation but does not specify their version numbers.
Experiment Setup	Yes	The shared and the environment-specific hyperparameters of MEow are summarized in Tables A1 and A3, respectively. For example, Table A1 lists 'optimizer Adam [78] learning rate (β) 0.001 gradient clip value 30 discount (γ) 0.99 buffer size 10^6'.