Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Authors: Pan Xu, Quanquan Gu
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we present a finite-time analysis of a neural Qlearning algorithm, where the data are generated from a Markov decision process, and the actionvalue function is approximated by a deep Re LU neural network. We prove that neural Q-learning finds the optimal policy with O(1/ T) convergence rate if the neural function approximator is sufficiently overparameterized, where T is the number of iterations. |
| Researcher Affiliation | Academia | Pan Xu 1 Quanquan Gu 1 Department of Computer Science, University of California, Los Angeles. Correspondence to: Quanquan Gu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Neural Q-Learning with Gaussian Initialization |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on analysis of an algorithm where "data are generated from a Markov decision process," but it does not specify or provide access information for a particular public dataset for training or evaluation. |
| Dataset Splits | No | This is a theoretical paper and does not describe empirical experiments with data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, providing an algorithm (Algorithm 1) and its convergence analysis, but it does not detail a specific experimental setup with concrete hyperparameter values or training configurations for empirical reproduction. |