A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models
Authors: Jiayi Wang, Zhengling Qi, Raymond K. W. Wong
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a simulation study to illustrate the behavior of the error |ˆν(π) ν(π)| with respect to n and T. The goal here is to provide empirical evidence of our theoretical results, and so we use a relatively simple simulation setup for the purpose of clear demonstration. |
| Researcher Affiliation | Academia | 1Department of Mathematical Sciences, University of Texas at Dallas, Richardson, USA 2School of Business, The George Washington University, Washington, D.C., USA 3Department of Statistics, Texas A&M University, College Station, USA. |
| Pseudocode | No | The paper describes the FQE method and provides mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about making its source code publicly available, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper uses synthetic data generated via a specified model rather than a publicly available dataset. It describes: 'The state variable is a one-dimensional continuous variable and the action is a binary variable, i.e., At = {0, 1} for all t. The initial state follows the uniform distribution within [ 2, 2]. The transition dynamics are given by Si,t+1 = (2Ai,t 1)f(Si,t)...' |
| Dataset Splits | No | The paper describes the use of leave-one-out cross-validation to decide the number of basis functions (K) but does not specify any explicit training, validation, or test dataset splits for the simulated data. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used to run the simulation studies. |
| Software Dependencies | No | The paper mentions using 'cubic B-spline' for constructing basis functions but does not specify any software packages or their version numbers used for implementation or analysis. |
| Experiment Setup | Yes | We conduct a simulation study to illustrate the behavior of the error |ˆν(π) ν(π)| with respect to n and T. ... We evaluate values with n = 200, 400, . . . , 2000, and T = 20, 40, . . . , 200. We use cubic B-spline to construct basis functions at every step t. The knots are placed at evenly distributed percentiles of samples. ... we fix K = 3n1/5. For the second approach, we use leave-one-out cross-validation to decide K at every step. |