A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Authors: Pan Xu, Quanquan Gu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we present a finite-time analysis of a neural Qlearning algorithm, where the data are generated from a Markov decision process, and the actionvalue function is approximated by a deep Re LU neural network. We prove that neural Q-learning finds the optimal policy with O(1/ T) convergence rate if the neural function approximator is sufficiently overparameterized, where T is the number of iterations. |
| Researcher Affiliation | Academia | Pan Xu 1 Quanquan Gu 1 Department of Computer Science, University of California, Los Angeles. Correspondence to: Quanquan Gu <qgu@cs.ucla.edu>. |
| Pseudocode | Yes | Algorithm 1 Neural Q-Learning with Gaussian Initialization |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on analysis of an algorithm where "data are generated from a Markov decision process," but it does not specify or provide access information for a particular public dataset for training or evaluation. |
| Dataset Splits | No | This is a theoretical paper and does not describe empirical experiments with data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, providing an algorithm (Algorithm 1) and its convergence analysis, but it does not detail a specific experimental setup with concrete hyperparameter values or training configurations for empirical reproduction. |