The Generalization Gap in Offline Reinforcement Learning

Authors: Ishita Mediratta, Qingfei You, Minqi Jiang, Roberta Raileanu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that offline learning algorithms perform worse on new environments than online learning ones. We also introduce the first benchmark for evaluating generalization in offline learning, collecting datasets of varying sizes and skilllevels from Procgen (2D video games) and Web Shop (e-commerce websites).
Researcher Affiliation Collaboration Ishita Mediratta α Qingfei You α Minqi Jiang α β Roberta Raileanu α β α Meta, β University College London
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We have open-sourced our codebase and datasets on Git Hub*. The code repository consists of two separate folders: 1. Procgen and 2. Web Shop. Each of these sub-repositories have a welldocumented README.md with all the necessary steps needed to reproduce the experimental results in this paper to implement and train other offline learning models, or to generate other datasets and use them.
Open Datasets Yes We also introduce the first benchmark for evaluating generalization in offline learning, collecting datasets of varying sizes and skilllevels from Procgen (2D video games) and Web Shop (e-commerce websites). ... We have open-sourced our codebase and datasets on Git Hub*.
Dataset Splits Yes For Procgen, we use |Ctrain| = 200, |Cval| = 50, and |Ctest| = 100, in line with prior work (Cobbe et al., 2020; Raileanu & Fergus, 2021; Jiang et al., 2021b), while for Web Shop we use |Ctrain| = 398, |Cval| = 54, and |Ctest| = 500 instructions, unless otherwise noted (Yao et al., 2022).
Hardware Specification Yes All of our experiments were run on a single NVIDIA V100 32GB GPU on the internal cluster, with varying training times and memory requirements.
Software Dependencies No The paper mentions software components like PPO, ResNet, BERT, and BART, but does not provide specific version numbers for software dependencies such as Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes Table 2: Table summarizing the hyperparameters used for PPO in Procgen and Table 3: List of hyperparameters used in Procgen experiments which include values for learning rate, batch size, various PPO, BCQ, CQL, IQL, BCT, and DT specific parameters.