Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

Authors: Jifeng Hu, Sili Huang, Zhejian Yang, Shengchao Hu, Li Shen, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the effectiveness of our method, we apply our method in offline RL benchmarks D4RL [21], where we select different tasks with various difficulties. We compare our method with dozens of baselines, which contain many types of methods, such as classifier-guided and classifier-free-guided diffusion models, behavior cloning, and transformer-based models. Through extensive experiments, we demonstrate that our method surpasses state-of-the-art algorithms in most environments.
Researcher Affiliation	Academia	1Jilin University 2Minzu University of China 3Shanghai Jiao Tong University 4Shenzhen Campus of Sun Yat-sen University 5Lehigh University 6Nanyang Technological University
Pseudocode	Yes	A Pseudocode of AEPO Algorithm 1 Analytic Energy-guided Policy Optimization (AEPO).
Open Source Code	Yes	Corresponding authors: Hechang Chen, Sili Huang, and Yi Chang. code: https://github.com/JF-Hu/Analytic-Energy-guided-Policy-Optimization
Open Datasets	Yes	We select D4RL tasks [21] as the test bed, which contains four types of benchmarks, Gym-Mu Jo Co, Pointmaze, Locomotion, and Adroit, with different dataset qualities. [21] Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219, 2020.
Dataset Splits	No	The paper mentions
Hardware Specification	Yes	We conduct the experiments on NVIDIA Ge Force RTX 3090 GPUs and NVIDIA A10 GPUs, and the CPU type is Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz.
Software Dependencies	No	The paper mentions DPM-solver but does not provide specific version numbers for any software libraries or frameworks used in the implementation beyond general algorithmic references.
Experiment Setup	Yes	Table 5: The hyperparameters of AEPO. Hyperparameter Value network backbone MLP action value function (Qψ) hidden layer 3 action value function (Qψ) hidden layer neuron 256 state value function (Vϕ) hidden layer 3 state value function (Vϕ) hidden layer neuron 256 intermediate energy function (EΘ) hidden layer 3 intermediate energy function (EΘ) hidden layer neuron 256/512/1024 inverse temperature β 1 expectile weight τ 0.5 guidance degree ω 0.1 ν 0.001