reproducibilityindex.ai

S$2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic

Authors: Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An, Haipeng Chen, Sanjay Chawla

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that S2AC yields more optimal solutions to the Max Ent objective than SQL and SAC in the multi-goal environment, and outperforms SAC and SQL on the Mu Jo Co benchmark. Our code is available at: https://github.com/Safa Messaoud/ S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic
Researcher Affiliation	Collaboration	1Qatar Computing Research Institute, Hamad Bin Khalifa University, 2School of Computer Science and Engineering, Nanyang Technological University, 3Sales Force, 4Skywork AI, 5Data Science, William & Mary {smessaoud,bmokeddem,schawla}@hbku.edu.qa, zhenghai001@e.ntu.edu.sg panglinsey@gmail.com, boan@ntu.edu.sg, hchen23@wm.edu Equal contribution Corresponding authors
Pseudocode	Yes	The complete S2AC algorithm is in Algorithm 1 of Appendix A.
Open Source Code	Yes	Our code is available at: https://github.com/Safa Messaoud/ S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic
Open Datasets	Yes	We evaluate S2AC on ﬁve environments from Mu Jo Co (Brockman et al., 2016): Hopper-v2, Walker2dv2, Half Cheetah-v2, Ant-v2, and Humanoid-v2.
Dataset Splits	No	The paper describes training and evaluation for reinforcement learning tasks, but does not provide specific data splits (e.g., percentages or counts) for distinct training, validation, and test datasets in the traditional sense, as is common for supervised learning tasks with static datasets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	Appendix J: Multi-Goal Environment Details and Hyperparameters and Appendix K: MuJoCo Experiment Details and Hyperparameters contain specific hyperparameter values like learning rates, batch sizes, and discount factors.