reproducibilityindex.ai

Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning

Authors: Yeda Song, Dongwook Lee, Gunhee Kim

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply COCOA to four state-of-the-art offline RL algorithms and evaluate them on the D4RL benchmark, where COCOA generally improves the performance of each algorithm.
Researcher Affiliation	Academia	Yeda Song , Dongwook Lee & Gunhee Kim Seoul National University yeda.song@vision.snu.ac.kr, {dwsmart32, gunhee}@snu.ac.kr
Pseudocode	Yes	Algorithm 1 Generation of Anchor-Seeking Trajectory
Open Source Code	Yes	The code is available at https://github.com/runamu/compositionalconservatism.
Open Datasets	Yes	We evaluate our method on the Gym-Mu Jo Co tasks in D4RL benchmark (Fu et al., 2020)
Dataset Splits	No	The paper mentions 'validation error' in the context of dynamics model selection but does not explicitly state the dataset splits (e.g., percentages or counts for train/validation/test) needed to reproduce the experiment.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running experiments.
Software Dependencies	No	The paper mentions reliance on 'Pytorch' via a reference but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	For both CQL and CQL+COCOA, we use α = 5.0 for all D4RL-Gym tasks... For IQL, we use the same hyperparameters described in the original paper... τ = 0.7 and β = 3.0... For MOPO, we search for the best penalty coefficient λ and rollout length hr... λ {0.1, 0.5, 1.0, 5.0, 10.0}, hr {1, 5, 7, 10}...