reproducibilityindex.ai

Adaptive Skills Adaptive Partitions (ASAP)

Authors: Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments have been performed on four different continuous domains: the Two Rooms (2R) domain (Figure 1b), the Flipped 2R domain (Figure 1c), the Three rooms (3R) domain (Figure 1d) and Robo Cup domains (Figure 1e) that include a one-on-one scenario between a striker and a goalkeeper (R1), a two-on-one scenario of a striker against a goalkeeper and a defender (R2), and a striker against two defenders and a goalkeeper (R3) (see supplementary material).
Researcher Affiliation	Collaboration	Daniel J. Mankowitz, Timothy A. Mann and Shie Mannor The Technion Israel Institute of Technology, Haifa, Israel danielm@tx.technion.ac.il, mann.timothy@acm.org, shie@ee.technion.ac.il Timothy Mann now works at Google Deepmind.
Pseudocode	Yes	Algorithm 1 ASAP
Open Source Code	No	The paper refers to a third-party open-source project (Robo Cup HFO) in footnote 3, but does not provide access to the authors' own implementation code for the ASAP framework.
Open Datasets	No	The paper describes custom simulation environments (Two Rooms, Flipped 2R, Three Rooms, Robo Cup) rather than using a pre-existing publicly available dataset, and no concrete access information for a dataset is provided.
Dataset Splits	No	The paper describes training an agent in simulation environments for a certain number of episodes, but does not refer to explicit training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'Robo Cup 2D soccer simulation domain' and 'Actor-Critic Policy Gradient (AC-PG)', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For each experiment we implement ASAP using Actor-Critic Policy Gradient (AC-PG) as the learning algorithm. In both domains, the agent (red ball) needs to reach the goal location (blue square) in the shortest amount of time. The agent receives constant negatives rewards and upon reaching the goal, receives a large positive reward. The state space is a 4-tuple consisting of the continuous xagent, yagent location of the agent and the xgoal, ygoal location of the center of the goal. The agent can move in each of the four cardinal directions. For each experiment involving the two room domains, a single hyperplane is learned (resulting in two SPs) with a linear feature vector representation ψx,m = [1, xagent, yagent].