Reinforcement Learning with Parameterized Actions
Authors: Warwick Masson, Pravesh Ranchod, George Konidaris
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goalscoring and Platform domains. |
| Researcher Affiliation | Academia | Warwick Masson and Pravesh Ranchod School of Computer Science and Applied Mathematics University of Witwatersrand Johannesburg, South Africa warwick.masson@students.wits.ac.za pravesh.ranchod@wits.ac.za George Konidaris Department of Computer Science Duke University Durham, North Carolina 27708 gdk@cs.duke.edu |
| Pseudocode | Yes | Algorithm 1 Q-PAMDP(k) |
| Open Source Code | No | No explicit statement or link regarding the public availability of source code for the described methodology was found. |
| Open Datasets | No | The paper describes experiments in the 'goalscoring' and 'Platform' domains, which appear to be simulation environments set up by the authors rather than pre-existing public datasets with explicit access information. It references 'Kitano et al. 1997' for the robot soccer problem, but this is a problem description, not a dataset citation with access details. |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset splits, percentages, or sample counts. It mentions 'averaged over 20 runs' for evaluation, but not data partitioning. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions algorithms like 'gradient-descent Sarsa(λ)' and 'e NAC' but does not provide specific software or library names with version numbers (e.g., Python 3.x, PyTorch 1.x) that are required to reproduce the experiments. |
| Experiment Setup | Yes | At each step we perform one e NAC update based on 50 episodes and then refit Qω using 50 gradient descent Sarsa(λ) episodes. |