Zap Q-Learning With Nonlinear Function Approximation

Authors: Shuhang Chen, Adithya M Devraj, Fan Lu, Ana Busic, Sean Meyn

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from Open AI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.
Researcher Affiliation Academia Shuhang Chen University of Florida shuhangchen@ufl.edu Adithya M. Devraj Stanford University adevraj@stanford.edu Fan Lu University of Florida fan.lu@ufl.edu Ana Buši c INRIA École Normale Supérieure PSL Research University ana.busic@inria.fr Sean P. Meyn University of Florida meyn@ece.ufl.edu
Pseudocode No While the paper describes algorithm steps with equations (e.g., Zap SA Algorithm with equations 14a-14b and 21a-21d), it does not include a formally structured pseudocode or algorithm block.
Open Source Code No The paper does not provide a specific repository link or an explicit statement about the open-sourcing of the code for the methodology described.
Open Datasets Yes The Zap Q-learning algorithm was tested on three examples from Open AI gym: Mountain Car, Acrobot, and Cartpole [1]. [1] Open AI Gym website. github.com/openai/gym/wiki.
Dataset Splits No The paper does not provide explicit details about the dataset splits (e.g., specific percentages or sample counts for training, validation, and test sets).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). It only generally refers to neural networks and activation functions.
Experiment Setup Yes Common in each experiment: a feedforward neural network that is fully connected, using the Leaky ReLU activation function. Two minor modifications of the algorithm were used in these experiments to reduce complexity: 1. Periodic gain update. An integer Nd > 1 was fixed...We observed that Nd = 50 worked well for all experiments...2. Periodic eligibility update...N = 2000. ... Experiments were performed with both decreasing step-size (defined in (22)), and constant step-size. ... parameters randomly initialized by the Kaiming uniform method [20, 36].