reproducibilityindex.ai

RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning

Authors: Yukinari Hisaki, Isao Ono

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our method to the Gymnasium s Mujoco tasks, a subset of locomotion tasks, and demonstrate that RVI-SAC shows competitive performance compared to existing methods. and 4. Experiment In our benchmark experiments, we aim to verify two aspects: (1) A comparison of the performance between RVI-SAC, SAC(Haarnoja et al., 2018b) with various discount rates, and the existing off-policy average reward DRL method, ARO-DDPG (Saxena et al., 2023).
Researcher Affiliation	Academia	Yukinari Hisaki 1 Isao Ono 1 1Tokyo Institute of Technology Yokohama, Kanagawa, Japan. Correspondence to: Yukinari Hisaki <hiskai.y@ic.c.titech.ac.jp>, Isao Ono <isao@c.titech.ac.jp>.
Pseudocode	Yes	Appendix B. Overall RVI-SAC algorithm and implementation and Algorithm 1 RVI-SAC
Open Source Code	Yes	The source code for this experiment can be found on our Git Hub repository at https://github.com/yhisaki/average-reward-drl.
Open Datasets	Yes	we conducted benchmark experiments using six tasks (Ant, Half Cheetah, Hopper, Walker2d, Humanoid, and Swimmer) implemented in the Gymnasium (Towers et al., 2023) and Mu Jo Co physical simulator (Todorov et al., 2012).
Dataset Splits	No	No specific dataset split information for training, validation, or testing was provided, only general mentions of 'training' and 'evaluation'.
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned.
Software Dependencies	No	The paper mentions 'Gymnasium (Towers et al., 2023) and Mu Jo Co physical simulator (Todorov et al., 2012)' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Appendix E. Hyperparameter settings and Table 1. Hyperparameters of RVI-SAC and SAC. We summarize the hyperparameters used in RVI-SAC and SAC in Table 1. We used the same hyperparameters for ARODDPG as Saxena et al. (2023). [Table lists: Discount Factor γ, Optimizer, Learning Rate, Batch Size \|B\|, Replay Buffer Size \|D\|, Critic Network, Actor Network, Activation Function, Target Smoothing Coefﬁcient τ, Entrpy Target H, Critc Network for Reset, Delayd f(Q) Update Parameter κ, Termination Frequency Target ϵreset]