Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
Authors: Joey Hong, Anca Dragan, Sergey Levine
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation aims to empirically analyze the relationship between the performance of offline RL in partially observed settings and the bisimulation loss we discussed in Section 6. Our hypothesis is that, if na ıve offline RL performs poorly on a given POMDP, then adding the bisimulation loss should improve performance, and if offline RL already does well, then the learned representations should already induce a bisimulation metric, and thus a low value of this loss. Note that our theory does not state that na ıve offline RL will always perform poorly, just that it has a poor worst-case bound, so we would not expect an explicit bisimulation loss to always be necessary, though we hypothesize that successful offline RL runs might still minimize loss as a byproduct of successful learning when they work well. We describe the main elements of each evaluation in the main paper, and defer implementation details to Appendix B. 7.1 TABULAR NAVIGATION |
| Researcher Affiliation | Academia | Joey Hong Anca Dragan Sergey Levine UC Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 Offline RL with Bisimulation Learning |
| Open Source Code | No | The paper does not explicitly state that source code is open-sourced or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | We use a dataset of Wordle games played by real humans and scraped from tweets, which was originally compiled and processed by Snell et al. (2023). |
| Dataset Splits | No | The paper mentions dataset creation and sizes but does not specify training, validation, or test splits with percentages or counts. |
| Hardware Specification | Yes | All algorithms were trained on a single V100 GPU until convergence, which took less than 3 days. |
| Software Dependencies | No | The paper mentions GPT-2 and Adam W but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We use the hyperparameters reported in Table 3. |