reproducibilityindex.ai

Bellman Residual Orthogonalization for Offline Reinforcement Learning

Authors: Andrea Zanette, Martin J Wainwright

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose and analyze a reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-deﬁned space of test functions. Focusing on applications to model-free ofﬂine RL with function approximation, we exploit this principle to derive conﬁdence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class. We prove an oracle inequality on our policy optimization procedure in terms of a trade-off between the value and uncertainty of an arbitrary comparator policy. ... We examine in depth the implementation of our methods with linear function approximation, and provide theoretical guarantees with polynomial-time implementations even when Bellman closure does not hold. Also, the author's checklist under 'If you ran experiments...' has '[N/A]' for points 3a-3d.
Researcher Affiliation	Academia	Andrea Zanette Department of Computer Sciences and Electrical Engineering University of California, Berkeley zanette@berkeley.edu Martin J. Wainwright Department of Electrical Engineering and Computer Sciences, Department of Mathematics, Massachusetts Institute of Technology, and Department of Computer Sciences and Electrical Engineering, Department of Statistics, University of California, Berkeley wainwrigwork@gmail.com
Pseudocode	No	The paper describes the steps for its 'actor-critic method' in Section 5.2, listing them as numbered points (e.g., 'At each iteration t = 1, . . . , T, ...', 'Using the ﬁnite test function class (18)...', 'Using the action-value vector wt...'), but this is presented as descriptive text and not in a structured 'Algorithm' block or 'Pseudocode' format.
Open Source Code	No	In the 'Ethics Review' section, under 'If you ran experiments...', the points 3a ('Did you include the code, data, and instructions needed to reproduce the main experimental results...') and 4c ('Did you include any new assets either in the supplemental material or as a URL?') are marked as '[N/A]'. There is no explicit statement or link in the paper providing open-source code for the methodology.
Open Datasets	No	The paper introduces 'Assumption 1 (I.i.d. dataset)' to describe the data generation mechanism: 'An i.i.d. dataset is a collection D = {(si, ai, ri, s+i)n i=1 such that for each i = 1, . . . , n we have (si, ai, oi) µ and conditioned on (si, ai, oi), we observe a noisy reward ri = r(si, ai) + i with E[ i \| Fi] = 0, \|ri\| 1 and the next state s+i P(si, ai).' However, it does not specify a named public dataset or provide access information (link, citation) for a publicly available or open dataset.
Dataset Splits	No	The paper is theoretical and does not describe experimental setups with data splits. The 'Ethics Review' section explicitly states '[N/A]' for questions regarding running experiments and training details (3b).
Hardware Specification	No	The paper does not provide any specific hardware details. In the 'Ethics Review' section, under 'If you ran experiments...', point 3d ('Did you include the total amount of compute and the type of resources used...') is marked '[N/A]', indicating no experiments with specific hardware were conducted or reported.
Software Dependencies	No	The paper is theoretical and does not describe any specific software dependencies with version numbers. In the 'Ethics Review' section, under 'If you ran experiments...', points 3a and 3b are marked '[N/A]', which cover code and training details.
Experiment Setup	No	The paper is theoretical and does not provide details about an experimental setup, hyperparameters, or system-level training settings. In the 'Ethics Review' section, under 'If you ran experiments...', point 3b ('Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)?') is marked '[N/A]'.