reproducibilityindex.ai

Offline Reinforcement Learning with Closed-Form Policy Improvement Operators

Authors: Jiachen Li, Edwin Zhang, Ming Yin, Qinxun Bai, Yu-Xiang Wang, William Yang Wang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We instantiate both one-step and iterative offline RL algorithms with our novel policy improvement operators and empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
Researcher Affiliation	Collaboration	1Department of Computer Science, University of California, Santa Barbara, Santa Barbara, CA 93106 USA, USA 2Harvard University, USA 3Horizon Robotics Inc., Cupertino, CA, 95014 USA.
Pseudocode	Yes	Algorithm 1 Offline RL with CFPI operators
Open Source Code	Yes	Our code is available at https://cfpi-icml23.github.io/.
Open Datasets	Yes	We evaluate the effectiveness of our one-step algorithm on the D4RL benchmark focusing on the Gym-Mu Jo Co domain... standard D4RL benchmark (Fu et al., 2020).
Dataset Splits	Yes	Next, we randomly split the dataset with the ratio 95/5 to create the trainining set Dtrain validation set Dval.
Hardware Specification	No	The paper states, "Our experiments are conducted on various types of 8GPUs machines. Different machines may have different GPU types, such as NVIDIA GA100 and TU102." This general description does not provide specific, consistent model numbers or detailed configurations for the entire experimental setup.
Software Dependencies	Yes	We use the Adam (Kingma & Ba, 2014) optimizer for all learning algorithms... We use the PyTorch (Paszke et al., 2019) Implementation of IQL from RLkit (Berkeley)...
Experiment Setup	Yes	Table 8 includes the HP of methods evaluated on the Gym-Mu Jo Co domain. MG-BC. We train the policy for 500K gradient steps. SARSA. We parameterize the value function with the IQN (Dabney et al., 2018a) architecture and train it to model the distribution Zβ...