ASAP-UCT: Abstraction of State-Action Pairs in UCT
Authors: Ankit Anand, Aditya Grover, Mausam, Parag Singla
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluation on several benchmark domains shows up to 26% improvement in the quality of policies obtained over existing algorithms. Our experiments aim to study the comparative performance of various abstraction approaches in UCT. |
| Researcher Affiliation | Academia | Indian Institute of Technology, Delhi New Delhi, India |
| Pseudocode | Yes | Algorithm 1 Computing Abstract Search Tree, Algorithm 2 Abstraction of States, Algorithm 3 Abstraction of State-Action Pairs, Algorithm 4 ASAP-UCT Algorithm |
| Open Source Code | Yes | 2. We implement and release1 ASAP-UCT, an algorithm that exploits SAP abstractions in a UCT framework. 1Available at https://github.com/dair-iitd/asap-uct |
| Open Datasets | Yes | We experiment on three diverse domains, Sailing Wind [Kocsis and Szepesv ari, 2006; Bonet and Geffner, 2012], Game of Life [Sanner and Yoon, 2011], and Navigation [Sanner and Yoon, 2011]. Our empirical results are reported on two IPPC-2011 instances, of dimensions 3 3 and 4 4 (#states: 29, 216). |
| Dataset Splits | No | The paper describes running experiments for a certain number of 'trials' and using 'execution horizon', but it does not specify explicit train/validation/test dataset splits as typically done for supervised learning tasks. |
| Hardware Specification | Yes | All our experiments are performed on a Quad-Core Intel i-5 processor. |
| Software Dependencies | No | The paper mentions using specific algorithms and borrowing base code for UCT, but it does not provide a list of software dependencies with specific version numbers (e.g., programming language versions, library versions). |
| Experiment Setup | Yes | The exploration constant K for the UCB equation is set as the negative of the magnitude of current Q value at the node (following [Bonet and Geffner, 2012]). we used l = 1 in our experiments. We use 100 as the execution horizon for Sailing wind and Navigation domains and 40 for Game of Life domain. |