reproducibilityindex.ai

The Generalization Gap in Offline Reinforcement Learning

Authors: Ishita Mediratta, Qingfei You, Minqi Jiang, Roberta Raileanu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that offline learning algorithms perform worse on new environments than online learning ones. We also introduce the first benchmark for evaluating generalization in offline learning, collecting datasets of varying sizes and skilllevels from Procgen (2D video games) and Web Shop (e-commerce websites).
Researcher Affiliation	Collaboration	Ishita Mediratta α Qingfei You α Minqi Jiang α β Roberta Raileanu α β α Meta, β University College London
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We have open-sourced our codebase and datasets on Git Hub*. The code repository consists of two separate folders: 1. Procgen and 2. Web Shop. Each of these sub-repositories have a welldocumented README.md with all the necessary steps needed to reproduce the experimental results in this paper to implement and train other offline learning models, or to generate other datasets and use them.
Open Datasets	Yes	We also introduce the first benchmark for evaluating generalization in offline learning, collecting datasets of varying sizes and skilllevels from Procgen (2D video games) and Web Shop (e-commerce websites). ... We have open-sourced our codebase and datasets on Git Hub*.
Dataset Splits	Yes	For Procgen, we use \|Ctrain\| = 200, \|Cval\| = 50, and \|Ctest\| = 100, in line with prior work (Cobbe et al., 2020; Raileanu & Fergus, 2021; Jiang et al., 2021b), while for Web Shop we use \|Ctrain\| = 398, \|Cval\| = 54, and \|Ctest\| = 500 instructions, unless otherwise noted (Yao et al., 2022).
Hardware Specification	Yes	All of our experiments were run on a single NVIDIA V100 32GB GPU on the internal cluster, with varying training times and memory requirements.
Software Dependencies	No	The paper mentions software components like PPO, ResNet, BERT, and BART, but does not provide specific version numbers for software dependencies such as Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	Table 2: Table summarizing the hyperparameters used for PPO in Procgen and Table 3: List of hyperparameters used in Procgen experiments which include values for learning rate, batch size, various PPO, BCQ, CQL, IQL, BCT, and DT specific parameters.