Zap Q-Learning

Authors: Adithya M Devraj, Sean Meyn

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments confirm the quick convergence, even in such non-ideal cases. Results from numerical experiments are surveyed here to illustrate the performance of the Zap Qlearning algorithm. (Section 3)
Researcher Affiliation Academia Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32608. adithyamdevraj@ufl.edu, meyn@ece.ufl.edu
Pseudocode Yes Algorithm 1 Zap Q(λ)-learning
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide any links to a code repository or mention code availability in supplementary materials.
Open Datasets No The paper mentions a "simple path-finding problem" and a "Finance model" taken from [27, 7]. While it refers to previous work, it does not provide concrete access information (specific link, DOI, repository name, or explicit statement of public availability with author/year attribution within this paper) for these datasets.
Dataset Splits No The paper describes numerical experiments but does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes A special case is considered in the analysis here: the basis is chosen as in Watkins algorithm, λ = 0, and αn 1/n. and αn = n −1 , γn = n ρ , n 1 , for some fixed ρ ( 1/2 , 1). and Experiments using β = 0.8 and g = 70 resulted in values... and with gain g = 70, and the Zap algorithm, γn α0.85 n . These mention specific values for parameters used in the experiments.