Learning Predictive State Representations via Monte-Carlo Tree Search
Authors: Yunlong Liu, Hexing Zhu, Yifeng Zeng, Zongxiong Dai
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on several domains including one extremely large domain and the experimental results show the effectiveness of our approach. |
| Researcher Affiliation | Academia | 1Department of Automation, Xiamen University, China 2School of Computing, Teesside University, UK |
| Pseudocode | Yes | Algorithm 1 shows our proposed algorithm in detail in pseudocode, where s(v) denotes the corresponding state of node v, which is the set of actions(tests) starting from the root node to node v. Algorithm 1: The discovery using MCTS algorithm |
| Open Source Code | No | The paper does not provide a link to open-source code for their method or explicitly state that their code is being released. |
| Open Datasets | Yes | We evaluated the proposed technique in three domains of different size, namely Cheese Maze, Hallway2 [Cassandra, 1999] and Poc Man [Silver and Veness, 2010; Hamilton et al., 2014]. |
| Dataset Splits | No | No explicit validation dataset split information (e.g., percentages, counts, cross-validation setup) was found. The paper mentions training and testing sequence lengths but not a distinct validation split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used. |
| Experiment Setup | Yes | To accelerate the search process and as the order of actions in the action sequence for our method has no effect on the final result, for Cheese Maze, the number of legal actions at each node was set to 10, the candidate actions were limited to the possible length 1 and 2 tests; for Hallway2 and Poc Man, the number of legal actions at each node was set to 20, the candidate actions were limited to the possible length 1 tests. The exploration constant c was set to 0.001 and a state is considered to be terminal when the search reaches a certain depth. |