Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Authors: Jifeng Hu, Sili Huang, Zhejian Yang, Shengchao Hu, Li Shen, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of our method, we apply our method in offline RL benchmarks D4RL [21], where we select different tasks with various difficulties. We compare our method with dozens of baselines, which contain many types of methods, such as classifier-guided and classifier-free-guided diffusion models, behavior cloning, and transformer-based models. Through extensive experiments, we demonstrate that our method surpasses state-of-the-art algorithms in most environments. |
| Researcher Affiliation | Academia | 1Jilin University 2Minzu University of China 3Shanghai Jiao Tong University 4Shenzhen Campus of Sun Yat-sen University 5Lehigh University 6Nanyang Technological University |
| Pseudocode | Yes | A Pseudocode of AEPO Algorithm 1 Analytic Energy-guided Policy Optimization (AEPO). |
| Open Source Code | Yes | Corresponding authors: Hechang Chen, Sili Huang, and Yi Chang. code: https://github.com/JF-Hu/Analytic-Energy-guided-Policy-Optimization |
| Open Datasets | Yes | We select D4RL tasks [21] as the test bed, which contains four types of benchmarks, Gym-Mu Jo Co, Pointmaze, Locomotion, and Adroit, with different dataset qualities. [21] Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219, 2020. |
| Dataset Splits | No | The paper mentions |
| Hardware Specification | Yes | We conduct the experiments on NVIDIA Ge Force RTX 3090 GPUs and NVIDIA A10 GPUs, and the CPU type is Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz. |
| Software Dependencies | No | The paper mentions DPM-solver but does not provide specific version numbers for any software libraries or frameworks used in the implementation beyond general algorithmic references. |
| Experiment Setup | Yes | Table 5: The hyperparameters of AEPO. Hyperparameter Value network backbone MLP action value function (Qψ) hidden layer 3 action value function (Qψ) hidden layer neuron 256 state value function (Vϕ) hidden layer 3 state value function (Vϕ) hidden layer neuron 256 intermediate energy function (EΘ) hidden layer 3 intermediate energy function (EΘ) hidden layer neuron 256/512/1024 inverse temperature β 1 expectile weight τ 0.5 guidance degree ω 0.1 ν 0.001 |