reproducibilityindex.ai

Zap Q-Learning With Nonlinear Function Approximation

Authors: Shuhang Chen, Adithya M Devraj, Fan Lu, Ana Busic, Sean Meyn

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from Open AI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.
Researcher Affiliation	Academia	Shuhang Chen University of Florida shuhangchen@ufl.edu Adithya M. Devraj Stanford University adevraj@stanford.edu Fan Lu University of Florida fan.lu@ufl.edu Ana Buši c INRIA École Normale Supérieure PSL Research University ana.busic@inria.fr Sean P. Meyn University of Florida meyn@ece.ufl.edu
Pseudocode	No	While the paper describes algorithm steps with equations (e.g., Zap SA Algorithm with equations 14a-14b and 21a-21d), it does not include a formally structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide a specific repository link or an explicit statement about the open-sourcing of the code for the methodology described.
Open Datasets	Yes	The Zap Q-learning algorithm was tested on three examples from Open AI gym: Mountain Car, Acrobot, and Cartpole [1]. [1] Open AI Gym website. github.com/openai/gym/wiki.
Dataset Splits	No	The paper does not provide explicit details about the dataset splits (e.g., specific percentages or sample counts for training, validation, and test sets).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). It only generally refers to neural networks and activation functions.
Experiment Setup	Yes	Common in each experiment: a feedforward neural network that is fully connected, using the Leaky ReLU activation function. Two minor modiﬁcations of the algorithm were used in these experiments to reduce complexity: 1. Periodic gain update. An integer Nd > 1 was ﬁxed...We observed that Nd = 50 worked well for all experiments...2. Periodic eligibility update...N = 2000. ... Experiments were performed with both decreasing step-size (deﬁned in (22)), and constant step-size. ... parameters randomly initialized by the Kaiming uniform method [20, 36].