reproducibilityindex.ai

Supported Trust Region Optimization for Offline Reinforcement Learning

Authors: Yixiu Mao, Hongchang Zhang, Chen Chen, Yi Xu, Xiangyang Ji

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results validate the theory of STR and demonstrate its state-of-the-art performance on Mu Jo Co locomotion domains and much more challenging Ant Maze domains.
Researcher Affiliation	Academia	1Department of Automation, Tsinghua University 2School of Artificial Intelligence, Dalian University of Technology.
Pseudocode	Yes	Algorithm 1 STR (Tabular) and Algorithm 2 STR (Practical) are provided in Section 4.3.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets	Yes	We test the effectiveness of STR (Algorithm 2) in terms of performance, safe policy improvement, and hyperparameter robustness using the D4RL benchmark (Fu et al., 2020).
Dataset Splits	No	The paper mentions evaluating performance on trajectories but does not specify the train/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification	Yes	We test the runtime of STR on halfcheetah-medium-replay on a Ge Force RTX 3090.
Software Dependencies	No	The paper mentions optimizers and algorithms like
Experiment Setup	Yes	Table 3. Hyperparameters of policy training in STR, includes: Critic learning rate 3e-4, Actor learning rate 3e-4 with cosine schedule, Batch size 256, Discount factor 0.99, Number of iterations 1e6, Target update rate τ 0.005, Policy update frequency 2, Number of Critics 4, Temperature λ {0.5, 2} for Gym-Mu Jo Co {0.1} for Ant Maze, Variance of Gaussian Policy 0.1.