Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning

Authors: Yeda Song, Dongwook Lee, Gunhee Kim

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply COCOA to four state-of-the-art offline RL algorithms and evaluate them on the D4RL benchmark, where COCOA generally improves the performance of each algorithm.
Researcher Affiliation Academia Yeda Song , Dongwook Lee & Gunhee Kim Seoul National University yeda.song@vision.snu.ac.kr, {dwsmart32, gunhee}@snu.ac.kr
Pseudocode Yes Algorithm 1 Generation of Anchor-Seeking Trajectory
Open Source Code Yes The code is available at https://github.com/runamu/compositionalconservatism.
Open Datasets Yes We evaluate our method on the Gym-Mu Jo Co tasks in D4RL benchmark (Fu et al., 2020)
Dataset Splits No The paper mentions 'validation error' in the context of dynamics model selection but does not explicitly state the dataset splits (e.g., percentages or counts for train/validation/test) needed to reproduce the experiment.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running experiments.
Software Dependencies No The paper mentions reliance on 'Pytorch' via a reference but does not specify version numbers for any software dependencies.
Experiment Setup Yes For both CQL and CQL+COCOA, we use α = 5.0 for all D4RL-Gym tasks... For IQL, we use the same hyperparameters described in the original paper... τ = 0.7 and β = 3.0... For MOPO, we search for the best penalty coefficient λ and rollout length hr... λ {0.1, 0.5, 1.0, 5.0, 10.0}, hr {1, 5, 7, 10}...