Learning to Prune Dominated Action Sequences in Online Black-Box Planning

Authors: Yuu Jinnai, Alex Fukunaga

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our pruning method to Iterated Width and breadth-first search in domain-independent blackbox planning for Atari 2600 games in the Arcade Learning Environment (ALE), adding our pruning method significantly improves upon the baseline algorithms.We evaluate DASA and DASP applied to p-IW(1), IW(1) and breadth first search (Br FS) on 53 games in the ALE, and show that on all three search methods, DASA improved the performance compared to the baseline search method as well as the baseline method using a hand-coded (human-generated), game-specific, restricted action set.
Researcher Affiliation Academia Yuu Jinnai, Alex Fukunaga Department of General Systems Studies Graduate School of Arts and Sciences The University of Tokyo
Pseudocode Yes Algorithm 1. [Find minimal action sequence set] 1. Initialize AL min to the set of all action sequences which generate one or more non-duplicate nodes. 2. Let G = (V, E) be a hypergraph where vi V represents an action sequence ai with no non-duplicate search nodes, and hyperedge e(v0, v1, .., vn) E if there exist one or more duplicate search nodes generated by all of a0, a1, ..., an but not by any other action sequences. 3. Add the minimal vertex cover of G to AL min.
Open Source Code No The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the described methodology.
Open Datasets Yes We evaluate our proposed dominated-action sequence detection strategies on a set of 53 single-player Atari 2600 games in the Arcade Learning Environment (ALE) (Bellemare et al. 2013).
Dataset Splits No The paper discusses a 'simulation frame budget' and 'planning episodes' in the context of online black-box planning, but does not provide specific training/validation/test dataset splits in the traditional sense of partitioning a static dataset.
Hardware Specification No The paper mentions that 'in ALIEN, p-IW(1) with DASA2 used >99% of the time for running simulator,' indicating high computational usage, but does not specify any particular hardware details such as GPU models, CPU models, or memory specifications used for the experiments.
Software Dependencies No The paper mentions the 'Arcade Learning Environment (ALE)' and various algorithms but does not provide specific software dependencies or their version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup Yes A maximum budget of 10,000 simulated frames is applied.Following previous work, all algorithms select an action every 5 frames (Bellemare et al. 2013; Lipovetzky, Ramirez, and Geffner 2015).During the first 12 planning episodes (i.e., k=12), DASP performs no pruning, and uses all available actions (5x12=60 in-game frames = 1 second).For DASA, we used a sigmoid function s(x) = 1 1+e 5(x 0.5) , minimal chance of applying action sequence ϵ = 0.04, discount factor for p value α = 0.95.