Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
Authors: Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we derive finite-sample bounds for any general off-policy TD-like stochastic approximation algorithm that solves for the fixed-point of this generalized Bellman operator. Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted ℓp-norm for each p in [1, ), with a common contraction factor. ... Did you run experiments? [N/A] |
| Researcher Affiliation | Collaboration | Zaiwei Chen Georgia Institute of Technology Siva Theja Maguluri Georgia Institute of Technology Sanjay Shakkottai The University of Texas at Austin Karthikeyan Shanmugam IBM Research NY |
| Pseudocode | Yes | Algorithm 1 A Generic Algorithm for Multi-Step Off-Policy TD-Learning |
| Open Source Code | No | The paper states in its checklist: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]' and does not provide any links or statements about open-sourcing code. |
| Open Datasets | No | The paper is theoretical and does not describe experiments involving datasets. The checklist includes: 'If you are using existing assets, did you cite the creators? [N/A]' |
| Dataset Splits | No | The paper is theoretical and does not describe experimental validation or dataset splits. The checklist includes: 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]' |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for experiments. The checklist states: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]' |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers for experiments. The checklist states: 'Did you include the code, data, and instructions needed to reproduce the main experimental results? [N/A]' |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup or hyperparameters. The checklist states: 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]' |