reproducibilityindex.ai

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

Authors: Joey Hong, Anca Dragan, Sergey Levine

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluation aims to empirically analyze the relationship between the performance of offline RL in partially observed settings and the bisimulation loss we discussed in Section 6. Our hypothesis is that, if na ıve offline RL performs poorly on a given POMDP, then adding the bisimulation loss should improve performance, and if offline RL already does well, then the learned representations should already induce a bisimulation metric, and thus a low value of this loss. Note that our theory does not state that na ıve offline RL will always perform poorly, just that it has a poor worst-case bound, so we would not expect an explicit bisimulation loss to always be necessary, though we hypothesize that successful offline RL runs might still minimize loss as a byproduct of successful learning when they work well. We describe the main elements of each evaluation in the main paper, and defer implementation details to Appendix B. 7.1 TABULAR NAVIGATION
Researcher Affiliation	Academia	Joey Hong Anca Dragan Sergey Levine UC Berkeley {joey hong,anca,sergey.levine}@berkeley.edu
Pseudocode	Yes	Algorithm 1 Offline RL with Bisimulation Learning
Open Source Code	No	The paper does not explicitly state that source code is open-sourced or provide a link to a code repository for the described methodology.
Open Datasets	Yes	We use a dataset of Wordle games played by real humans and scraped from tweets, which was originally compiled and processed by Snell et al. (2023).
Dataset Splits	No	The paper mentions dataset creation and sizes but does not specify training, validation, or test splits with percentages or counts.
Hardware Specification	Yes	All algorithms were trained on a single V100 GPU until convergence, which took less than 3 days.
Software Dependencies	No	The paper mentions GPT-2 and Adam W but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We use the hyperparameters reported in Table 3.