Learning to Prune Dominated Action Sequences in Online Black-Box Planning
Authors: Yuu Jinnai, Alex Fukunaga
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our pruning method to Iterated Width and breadth-first search in domain-independent blackbox planning for Atari 2600 games in the Arcade Learning Environment (ALE), adding our pruning method significantly improves upon the baseline algorithms.We evaluate DASA and DASP applied to p-IW(1), IW(1) and breadth first search (Br FS) on 53 games in the ALE, and show that on all three search methods, DASA improved the performance compared to the baseline search method as well as the baseline method using a hand-coded (human-generated), game-specific, restricted action set. |
| Researcher Affiliation | Academia | Yuu Jinnai, Alex Fukunaga Department of General Systems Studies Graduate School of Arts and Sciences The University of Tokyo |
| Pseudocode | Yes | Algorithm 1. [Find minimal action sequence set] 1. Initialize AL min to the set of all action sequences which generate one or more non-duplicate nodes. 2. Let G = (V, E) be a hypergraph where vi V represents an action sequence ai with no non-duplicate search nodes, and hyperedge e(v0, v1, .., vn) E if there exist one or more duplicate search nodes generated by all of a0, a1, ..., an but not by any other action sequences. 3. Add the minimal vertex cover of G to AL min. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the described methodology. |
| Open Datasets | Yes | We evaluate our proposed dominated-action sequence detection strategies on a set of 53 single-player Atari 2600 games in the Arcade Learning Environment (ALE) (Bellemare et al. 2013). |
| Dataset Splits | No | The paper discusses a 'simulation frame budget' and 'planning episodes' in the context of online black-box planning, but does not provide specific training/validation/test dataset splits in the traditional sense of partitioning a static dataset. |
| Hardware Specification | No | The paper mentions that 'in ALIEN, p-IW(1) with DASA2 used >99% of the time for running simulator,' indicating high computational usage, but does not specify any particular hardware details such as GPU models, CPU models, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions the 'Arcade Learning Environment (ALE)' and various algorithms but does not provide specific software dependencies or their version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | A maximum budget of 10,000 simulated frames is applied.Following previous work, all algorithms select an action every 5 frames (Bellemare et al. 2013; Lipovetzky, Ramirez, and Geffner 2015).During the first 12 planning episodes (i.e., k=12), DASP performs no pruning, and uses all available actions (5x12=60 in-game frames = 1 second).For DASA, we used a sigmoid function s(x) = 1 1+e 5(x 0.5) , minimal chance of applying action sequence ϵ = 0.04, discount factor for p value α = 0.95. |