S$2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic

Authors: Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An, Haipeng Chen, Sanjay Chawla

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that S2AC yields more optimal solutions to the Max Ent objective than SQL and SAC in the multi-goal environment, and outperforms SAC and SQL on the Mu Jo Co benchmark. Our code is available at: https://github.com/Safa Messaoud/ S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic
Researcher Affiliation Collaboration 1Qatar Computing Research Institute, Hamad Bin Khalifa University, 2School of Computer Science and Engineering, Nanyang Technological University, 3Sales Force, 4Skywork AI, 5Data Science, William & Mary {smessaoud,bmokeddem,schawla}@hbku.edu.qa, zhenghai001@e.ntu.edu.sg panglinsey@gmail.com, boan@ntu.edu.sg, hchen23@wm.edu Equal contribution Corresponding authors
Pseudocode Yes The complete S2AC algorithm is in Algorithm 1 of Appendix A.
Open Source Code Yes Our code is available at: https://github.com/Safa Messaoud/ S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic
Open Datasets Yes We evaluate S2AC on five environments from Mu Jo Co (Brockman et al., 2016): Hopper-v2, Walker2dv2, Half Cheetah-v2, Ant-v2, and Humanoid-v2.
Dataset Splits No The paper describes training and evaluation for reinforcement learning tasks, but does not provide specific data splits (e.g., percentages or counts) for distinct training, validation, and test datasets in the traditional sense, as is common for supervised learning tasks with static datasets.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes Appendix J: Multi-Goal Environment Details and Hyperparameters and Appendix K: MuJoCo Experiment Details and Hyperparameters contain specific hyperparameter values like learning rates, batch sizes, and discount factors.