reproducibilityindex.ai

Reinforcement Learning with Parameterized Actions

Authors: Warwick Masson, Pravesh Ranchod, George Konidaris

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goalscoring and Platform domains.
Researcher Affiliation	Academia	Warwick Masson and Pravesh Ranchod School of Computer Science and Applied Mathematics University of Witwatersrand Johannesburg, South Africa warwick.masson@students.wits.ac.za pravesh.ranchod@wits.ac.za George Konidaris Department of Computer Science Duke University Durham, North Carolina 27708 gdk@cs.duke.edu
Pseudocode	Yes	Algorithm 1 Q-PAMDP(k)
Open Source Code	No	No explicit statement or link regarding the public availability of source code for the described methodology was found.
Open Datasets	No	The paper describes experiments in the 'goalscoring' and 'Platform' domains, which appear to be simulation environments set up by the authors rather than pre-existing public datasets with explicit access information. It references 'Kitano et al. 1997' for the robot soccer problem, but this is a problem description, not a dataset citation with access details.
Dataset Splits	No	The paper does not provide specific train/validation/test dataset splits, percentages, or sample counts. It mentions 'averaged over 20 runs' for evaluation, but not data partitioning.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper mentions algorithms like 'gradient-descent Sarsa(λ)' and 'e NAC' but does not provide specific software or library names with version numbers (e.g., Python 3.x, PyTorch 1.x) that are required to reproduce the experiments.
Experiment Setup	Yes	At each step we perform one e NAC update based on 50 episodes and then reﬁt Qω using 50 gradient descent Sarsa(λ) episodes.