Learning Predictive State Representations via Monte-Carlo Tree Search

Authors: Yunlong Liu, Hexing Zhu, Yifeng Zeng, Zongxiong Dai

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on several domains including one extremely large domain and the experimental results show the effectiveness of our approach.
Researcher Affiliation Academia 1Department of Automation, Xiamen University, China 2School of Computing, Teesside University, UK
Pseudocode Yes Algorithm 1 shows our proposed algorithm in detail in pseudocode, where s(v) denotes the corresponding state of node v, which is the set of actions(tests) starting from the root node to node v. Algorithm 1: The discovery using MCTS algorithm
Open Source Code No The paper does not provide a link to open-source code for their method or explicitly state that their code is being released.
Open Datasets Yes We evaluated the proposed technique in three domains of different size, namely Cheese Maze, Hallway2 [Cassandra, 1999] and Poc Man [Silver and Veness, 2010; Hamilton et al., 2014].
Dataset Splits No No explicit validation dataset split information (e.g., percentages, counts, cross-validation setup) was found. The paper mentions training and testing sequence lengths but not a distinct validation split.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup Yes To accelerate the search process and as the order of actions in the action sequence for our method has no effect on the final result, for Cheese Maze, the number of legal actions at each node was set to 10, the candidate actions were limited to the possible length 1 and 2 tests; for Hallway2 and Poc Man, the number of legal actions at each node was set to 20, the candidate actions were limited to the possible length 1 tests. The exploration constant c was set to 0.001 and a state is considered to be terminal when the search reaches a certain depth.