reproducibilityindex.ai

Natural Temporal Difference Learning

Authors: William Dabney, Philip Thomas

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude with empirical comparisons on three canonical domains (mountain car, cartpole balancing, and acrobot) and one novel challenging domain (playing Tic-tac-toe using handwritten letters as input).
Researcher Affiliation	Academia	William Dabney and Philip S. Thomas School of Computer Science University of Massachusetts Amherst 140 Governors Dr., Amherst, MA 01003 {wdabney,pthomas}@cs.umass.edu
Pseudocode	Yes	Algorithm 1 Natural Residual Gradient; Algorithm 2 Natural Linear-Time Residual Gradient; Algorithm 3 Natural Sarsa(λ); Algorithm 4 Natural TDC
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We used an ϵ-greedy policy for all TD-learning algorithms. To evaluate the performance of natural TDC we generate experience from a ﬁxed policy in the acrobot domain...For mountain car, cart-pole balancing, and acrobot we used linear function approximation with a third-order Fourier basis (Konidaris et al. 2012). On visual Tic-tac-toe we used a fully-connected feed-forward artiﬁcial neural network with one hidden layer of 20 nodes. This allows us to show the beneﬁts of natural gradients when the value function parameterization is non-linear and more complex.
Dataset Splits	No	No specific dataset split information for validation was provided.
Hardware Specification	No	The paper does not provide specific hardware details used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	We optimized the algorithm parameters for all experiments using a randomized search as suggested by Bergstra and Bengio (2012). We used an ϵ-greedy policy for all TD-learning algorithms. For mountain car, cart-pole balancing, and acrobot we used linear function approximation with a third-order Fourier basis (Konidaris et al. 2012). On visual Tic-tac-toe we used a fully-connected feed-forward artiﬁcial neural network with one hidden layer of 20 nodes.