The Generalization Gap in Offline Reinforcement Learning
Authors: Ishita Mediratta, Qingfei You, Minqi Jiang, Roberta Raileanu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that offline learning algorithms perform worse on new environments than online learning ones. We also introduce the first benchmark for evaluating generalization in offline learning, collecting datasets of varying sizes and skilllevels from Procgen (2D video games) and Web Shop (e-commerce websites). |
| Researcher Affiliation | Collaboration | Ishita Mediratta α Qingfei You α Minqi Jiang α β Roberta Raileanu α β α Meta, β University College London |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have open-sourced our codebase and datasets on Git Hub*. The code repository consists of two separate folders: 1. Procgen and 2. Web Shop. Each of these sub-repositories have a welldocumented README.md with all the necessary steps needed to reproduce the experimental results in this paper to implement and train other offline learning models, or to generate other datasets and use them. |
| Open Datasets | Yes | We also introduce the first benchmark for evaluating generalization in offline learning, collecting datasets of varying sizes and skilllevels from Procgen (2D video games) and Web Shop (e-commerce websites). ... We have open-sourced our codebase and datasets on Git Hub*. |
| Dataset Splits | Yes | For Procgen, we use |Ctrain| = 200, |Cval| = 50, and |Ctest| = 100, in line with prior work (Cobbe et al., 2020; Raileanu & Fergus, 2021; Jiang et al., 2021b), while for Web Shop we use |Ctrain| = 398, |Cval| = 54, and |Ctest| = 500 instructions, unless otherwise noted (Yao et al., 2022). |
| Hardware Specification | Yes | All of our experiments were run on a single NVIDIA V100 32GB GPU on the internal cluster, with varying training times and memory requirements. |
| Software Dependencies | No | The paper mentions software components like PPO, ResNet, BERT, and BART, but does not provide specific version numbers for software dependencies such as Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | Table 2: Table summarizing the hyperparameters used for PPO in Procgen and Table 3: List of hyperparameters used in Procgen experiments which include values for learning rate, batch size, various PPO, BCQ, CQL, IQL, BCT, and DT specific parameters. |