ASAP-UCT: Abstraction of State-Action Pairs in UCT

Authors: Ankit Anand, Aditya Grover, Mausam, Parag Singla

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluation on several benchmark domains shows up to 26% improvement in the quality of policies obtained over existing algorithms. Our experiments aim to study the comparative performance of various abstraction approaches in UCT.
Researcher Affiliation Academia Indian Institute of Technology, Delhi New Delhi, India
Pseudocode Yes Algorithm 1 Computing Abstract Search Tree, Algorithm 2 Abstraction of States, Algorithm 3 Abstraction of State-Action Pairs, Algorithm 4 ASAP-UCT Algorithm
Open Source Code Yes 2. We implement and release1 ASAP-UCT, an algorithm that exploits SAP abstractions in a UCT framework. 1Available at https://github.com/dair-iitd/asap-uct
Open Datasets Yes We experiment on three diverse domains, Sailing Wind [Kocsis and Szepesv ari, 2006; Bonet and Geffner, 2012], Game of Life [Sanner and Yoon, 2011], and Navigation [Sanner and Yoon, 2011]. Our empirical results are reported on two IPPC-2011 instances, of dimensions 3 3 and 4 4 (#states: 29, 216).
Dataset Splits No The paper describes running experiments for a certain number of 'trials' and using 'execution horizon', but it does not specify explicit train/validation/test dataset splits as typically done for supervised learning tasks.
Hardware Specification Yes All our experiments are performed on a Quad-Core Intel i-5 processor.
Software Dependencies No The paper mentions using specific algorithms and borrowing base code for UCT, but it does not provide a list of software dependencies with specific version numbers (e.g., programming language versions, library versions).
Experiment Setup Yes The exploration constant K for the UCB equation is set as the negative of the magnitude of current Q value at the node (following [Bonet and Geffner, 2012]). we used l = 1 in our experiments. We use 100 as the execution horizon for Sailing wind and Navigation domains and 40 for Game of Life domain.