Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning
Authors: Yeda Song, Dongwook Lee, Gunhee Kim
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply COCOA to four state-of-the-art offline RL algorithms and evaluate them on the D4RL benchmark, where COCOA generally improves the performance of each algorithm. |
| Researcher Affiliation | Academia | Yeda Song , Dongwook Lee & Gunhee Kim Seoul National University yeda.song@vision.snu.ac.kr, {dwsmart32, gunhee}@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1 Generation of Anchor-Seeking Trajectory |
| Open Source Code | Yes | The code is available at https://github.com/runamu/compositionalconservatism. |
| Open Datasets | Yes | We evaluate our method on the Gym-Mu Jo Co tasks in D4RL benchmark (Fu et al., 2020) |
| Dataset Splits | No | The paper mentions 'validation error' in the context of dynamics model selection but does not explicitly state the dataset splits (e.g., percentages or counts for train/validation/test) needed to reproduce the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running experiments. |
| Software Dependencies | No | The paper mentions reliance on 'Pytorch' via a reference but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | For both CQL and CQL+COCOA, we use α = 5.0 for all D4RL-Gym tasks... For IQL, we use the same hyperparameters described in the original paper... τ = 0.7 and β = 3.0... For MOPO, we search for the best penalty coefficient λ and rollout length hr... λ {0.1, 0.5, 1.0, 5.0, 10.0}, hr {1, 5, 7, 10}... |