Adaptive Skills Adaptive Partitions (ASAP)
Authors: Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments have been performed on four different continuous domains: the Two Rooms (2R) domain (Figure 1b), the Flipped 2R domain (Figure 1c), the Three rooms (3R) domain (Figure 1d) and Robo Cup domains (Figure 1e) that include a one-on-one scenario between a striker and a goalkeeper (R1), a two-on-one scenario of a striker against a goalkeeper and a defender (R2), and a striker against two defenders and a goalkeeper (R3) (see supplementary material). |
| Researcher Affiliation | Collaboration | Daniel J. Mankowitz, Timothy A. Mann and Shie Mannor The Technion Israel Institute of Technology, Haifa, Israel danielm@tx.technion.ac.il, mann.timothy@acm.org, shie@ee.technion.ac.il Timothy Mann now works at Google Deepmind. |
| Pseudocode | Yes | Algorithm 1 ASAP |
| Open Source Code | No | The paper refers to a third-party open-source project (Robo Cup HFO) in footnote 3, but does not provide access to the authors' own implementation code for the ASAP framework. |
| Open Datasets | No | The paper describes custom simulation environments (Two Rooms, Flipped 2R, Three Rooms, Robo Cup) rather than using a pre-existing publicly available dataset, and no concrete access information for a dataset is provided. |
| Dataset Splits | No | The paper describes training an agent in simulation environments for a certain number of episodes, but does not refer to explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Robo Cup 2D soccer simulation domain' and 'Actor-Critic Policy Gradient (AC-PG)', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For each experiment we implement ASAP using Actor-Critic Policy Gradient (AC-PG) as the learning algorithm. In both domains, the agent (red ball) needs to reach the goal location (blue square) in the shortest amount of time. The agent receives constant negatives rewards and upon reaching the goal, receives a large positive reward. The state space is a 4-tuple consisting of the continuous xagent, yagent location of the agent and the xgoal, ygoal location of the center of the goal. The agent can move in each of the four cardinal directions. For each experiment involving the two room domains, a single hyperplane is learned (resulting in two SPs) with a linear feature vector representation ψx,m = [1, xagent, yagent]. |