S$2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic
Authors: Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An, Haipeng Chen, Sanjay Chawla
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that S2AC yields more optimal solutions to the Max Ent objective than SQL and SAC in the multi-goal environment, and outperforms SAC and SQL on the Mu Jo Co benchmark. Our code is available at: https://github.com/Safa Messaoud/ S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic |
| Researcher Affiliation | Collaboration | 1Qatar Computing Research Institute, Hamad Bin Khalifa University, 2School of Computer Science and Engineering, Nanyang Technological University, 3Sales Force, 4Skywork AI, 5Data Science, William & Mary {smessaoud,bmokeddem,schawla}@hbku.edu.qa, zhenghai001@e.ntu.edu.sg panglinsey@gmail.com, boan@ntu.edu.sg, hchen23@wm.edu Equal contribution Corresponding authors |
| Pseudocode | Yes | The complete S2AC algorithm is in Algorithm 1 of Appendix A. |
| Open Source Code | Yes | Our code is available at: https://github.com/Safa Messaoud/ S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic |
| Open Datasets | Yes | We evaluate S2AC on five environments from Mu Jo Co (Brockman et al., 2016): Hopper-v2, Walker2dv2, Half Cheetah-v2, Ant-v2, and Humanoid-v2. |
| Dataset Splits | No | The paper describes training and evaluation for reinforcement learning tasks, but does not provide specific data splits (e.g., percentages or counts) for distinct training, validation, and test datasets in the traditional sense, as is common for supervised learning tasks with static datasets. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | Appendix J: Multi-Goal Environment Details and Hyperparameters and Appendix K: MuJoCo Experiment Details and Hyperparameters contain specific hyperparameter values like learning rates, batch sizes, and discount factors. |