A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Authors: Pan Xu, Quanquan Gu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we present a finite-time analysis of a neural Qlearning algorithm, where the data are generated from a Markov decision process, and the actionvalue function is approximated by a deep Re LU neural network. We prove that neural Q-learning finds the optimal policy with O(1/ T) convergence rate if the neural function approximator is sufficiently overparameterized, where T is the number of iterations.
Researcher Affiliation Academia Pan Xu 1 Quanquan Gu 1 Department of Computer Science, University of California, Los Angeles. Correspondence to: Quanquan Gu <qgu@cs.ucla.edu>.
Pseudocode Yes Algorithm 1 Neural Q-Learning with Gaussian Initialization
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets No The paper is theoretical and focuses on analysis of an algorithm where "data are generated from a Markov decision process," but it does not specify or provide access information for a particular public dataset for training or evaluation.
Dataset Splits No This is a theoretical paper and does not describe empirical experiments with data splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup No The paper is theoretical, providing an algorithm (Algorithm 1) and its convergence analysis, but it does not detail a specific experimental setup with concrete hyperparameter values or training configurations for empirical reproduction.