Zap Q-Learning With Nonlinear Function Approximation
Authors: Shuhang Chen, Adithya M Devraj, Fan Lu, Ana Busic, Sean Meyn
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from Open AI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture. |
| Researcher Affiliation | Academia | Shuhang Chen University of Florida shuhangchen@ufl.edu Adithya M. Devraj Stanford University adevraj@stanford.edu Fan Lu University of Florida fan.lu@ufl.edu Ana Buši c INRIA École Normale Supérieure PSL Research University ana.busic@inria.fr Sean P. Meyn University of Florida meyn@ece.ufl.edu |
| Pseudocode | No | While the paper describes algorithm steps with equations (e.g., Zap SA Algorithm with equations 14a-14b and 21a-21d), it does not include a formally structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide a specific repository link or an explicit statement about the open-sourcing of the code for the methodology described. |
| Open Datasets | Yes | The Zap Q-learning algorithm was tested on three examples from Open AI gym: Mountain Car, Acrobot, and Cartpole [1]. [1] Open AI Gym website. github.com/openai/gym/wiki. |
| Dataset Splits | No | The paper does not provide explicit details about the dataset splits (e.g., specific percentages or sample counts for training, validation, and test sets). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). It only generally refers to neural networks and activation functions. |
| Experiment Setup | Yes | Common in each experiment: a feedforward neural network that is fully connected, using the Leaky ReLU activation function. Two minor modifications of the algorithm were used in these experiments to reduce complexity: 1. Periodic gain update. An integer Nd > 1 was fixed...We observed that Nd = 50 worked well for all experiments...2. Periodic eligibility update...N = 2000. ... Experiments were performed with both decreasing step-size (defined in (22)), and constant step-size. ... parameters randomly initialized by the Kaiming uniform method [20, 36]. |