reproducibilityindex.ai

In-sample Actor Critic for Offline Reinforcement Learning

Authors: Hongchang Zhang, Yixiu Mao, Boyuan Wang, Shuncheng He, Yi Xu, Xiangyang Ji

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that IAC obtains competitive performance compared to the state-of-the-art methods on Gym-Mu Jo Co locomotion domains and much more challenging Ant Maze domains.
Researcher Affiliation	Academia	1Tsinghua University 2Dalian University of Technology {hc-zhang19,myx21,wangby22,hesc16}@mails.tsinghua.edu, yxu@dlut.edu, xyji@tsinghua.edu
Pseudocode	Yes	Algorithm 1 IAC
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets	Yes	We test IAC on D4RL benchmark (Fu et al., 2020), including Gym-Mu Jo Co locomotion domains and much more challenging Ant Maze domains.
Dataset Splits	No	The paper mentions using the D4RL benchmark, but it does not explicitly provide details about the training, validation, or test dataset splits (e.g., percentages, sample counts, or specific citations for splits).
Hardware Specification	Yes	We test the runtime of IAC on halfcheetah-medium-replay on a Ge Force RTX 3090.
Software Dependencies	No	The paper mentions 'Optimizer Adam' but does not specify a version number for the software library or framework used (e.g., PyTorch, TensorFlow version).
Experiment Setup	Yes	Table 3: Hyperparameters of policy training in IAC. Optimizer Adam (Kingma & Ba, 2014) Critic learning rate 3 10 4 Actor learning rate 3 10 4 with cosine schedule Batch size 256 Discount factor 0.99 Number of iterations 106 Target update rate τ 0.005 Policy update frequency 2 Inverse temperature of AWR β {0.25, 5} for Gym-Mu Jo Co {10} for Ant Maze Variance of Gaussian Policy 0.1 Architecture Actor input-256-256-output Critic input-256-256-1