Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
Authors: Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the performance of our method, we conducted experiments on the Mu Jo Co benchmark suite and a number of high-dimensional robotic tasks simulated by Omniverse Isaac Gym. The evaluation results demonstrate that our method achieves superior performance compared to widely-adopted representative baselines. |
| Researcher Affiliation | Collaboration | 1 Elsa Lab, National Tsing Hua University, Hsinchu City, Taiwan 2 NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, CA, USA |
| Pseudocode | Yes | Algorithm 1 Pseudo Code of the Training Process of MEow |
| Open Source Code | Yes | The code is implemented using Py Torch [72] and is available in the following repository: https: //github.com/Chien Feng-hub/meow. |
| Open Datasets | Yes | We conducted experiments on the Mu Jo Co benchmark suite [32, 33] and a number of high-dimensional robotic tasks simulated by Omniverse Isaac Gym [34]. |
| Dataset Splits | No | The paper describes training processes and evaluation on reinforcement learning environments (Mu Jo Co, Omniverse Isaac Gym) which involve continuous interaction rather than predefined dataset splits for train/validation/test. |
| Hardware Specification | Yes | The computation was carried out on NVIDIA TITAN V GPUs equipped with 12GB of memory (Multi-Goal), NVIDIA V100 GPUs equipped with 16GB of memory (Mu Jo Co), and NVIDIA L40 GPUs equipped with 48GB of memory (Omniverse Isaac Gym). |
| Software Dependencies | No | The paper mentions software like PyTorch [72], Clean RL [76], and SKRL [79] for implementation but does not specify their version numbers. |
| Experiment Setup | Yes | The shared and the environment-specific hyperparameters of MEow are summarized in Tables A1 and A3, respectively. For example, Table A1 lists 'optimizer Adam [78] learning rate (β) 0.001 gradient clip value 30 discount (γ) 0.99 buffer size 10^6'. |