Adaptive Skills Adaptive Partitions (ASAP)

Authors: Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments have been performed on four different continuous domains: the Two Rooms (2R) domain (Figure 1b), the Flipped 2R domain (Figure 1c), the Three rooms (3R) domain (Figure 1d) and Robo Cup domains (Figure 1e) that include a one-on-one scenario between a striker and a goalkeeper (R1), a two-on-one scenario of a striker against a goalkeeper and a defender (R2), and a striker against two defenders and a goalkeeper (R3) (see supplementary material).
Researcher Affiliation Collaboration Daniel J. Mankowitz, Timothy A. Mann and Shie Mannor The Technion Israel Institute of Technology, Haifa, Israel danielm@tx.technion.ac.il, mann.timothy@acm.org, shie@ee.technion.ac.il Timothy Mann now works at Google Deepmind.
Pseudocode Yes Algorithm 1 ASAP
Open Source Code No The paper refers to a third-party open-source project (Robo Cup HFO) in footnote 3, but does not provide access to the authors' own implementation code for the ASAP framework.
Open Datasets No The paper describes custom simulation environments (Two Rooms, Flipped 2R, Three Rooms, Robo Cup) rather than using a pre-existing publicly available dataset, and no concrete access information for a dataset is provided.
Dataset Splits No The paper describes training an agent in simulation environments for a certain number of episodes, but does not refer to explicit training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions software components like 'Robo Cup 2D soccer simulation domain' and 'Actor-Critic Policy Gradient (AC-PG)', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For each experiment we implement ASAP using Actor-Critic Policy Gradient (AC-PG) as the learning algorithm. In both domains, the agent (red ball) needs to reach the goal location (blue square) in the shortest amount of time. The agent receives constant negatives rewards and upon reaching the goal, receives a large positive reward. The state space is a 4-tuple consisting of the continuous xagent, yagent location of the agent and the xgoal, ygoal location of the center of the goal. The agent can move in each of the four cardinal directions. For each experiment involving the two room domains, a single hyperplane is learned (resulting in two SPs) with a linear feature vector representation ψx,m = [1, xagent, yagent].