Factored MCTS for Large Scale Stochastic Planning

Authors: Hao Cui, Roni Khardon, Alan Fern, Prasad Tadepalli

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An extensive experimental evaluation demonstrates that the new algorithms provide significant improvement over the state of the art when solving large problems in a number of challenge benchmark domains. First, we present an experimental study on current challenge problems and moderately larger problems exposing the above mentioned phenomenon. We ran experiments on the Tufts UIT research cluster (each node includes Intel Xeon X5675@ 3GHz CPU, and 24GB memory).
Researcher Affiliation Academia Hao Cui and Roni Khardon Department of Computer Science Tufts University Medford, MA 02155, USA Alan Fern and Prasad Tadepalli School of Electrical Engineering and Computer Science Oregon State University Corvallis, OR 97331, USA
Pseudocode No The paper describes the algorithms (ARollout, AMCTS) in detail with textual explanations, but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing its own source code or a link to a repository containing the implementation of the described methodology. It mentions using third-party software like RDDL and PROST, but these are not the authors' code for their proposed methods.
Open Datasets Yes Four domains are from IPPC2011, the elevators domain (where people randomly arrive at each floor and go to either top or bottom floor), the sysadmin domain (where failures of computers depend on their neighbors and one can reboot a number of computers at each time step), the crossing traffic domain (where a robot tries to get to the other side of a river with randomly appearing flowing obstacles), and the traffic domain (where one controls traffic lights to enable traffic flow). The IPPC provided 10 instances for each domain. We similarly generated 20 instances for one-dir-elevators domain. 1http://concurrent-value-iteration.googlecode.com/svnhistory/r133/trunk/rddl/elevators_mdp.rddl
Dataset Splits No The paper refers to problem "instances" for evaluation but does not specify any train/validation/test dataset splits, as it's not based on a traditional supervised learning dataset where such splits are common. It's about algorithms for stochastic planning problems.
Hardware Specification Yes We ran experiments on the Tufts UIT research cluster (each node includes Intel Xeon X5675@ 3GHz CPU, and 24GB memory).
Software Dependencies No The paper mentions using "RDDL software" and the "PROST system" but does not provide specific version numbers for these or any other software dependencies, which would be necessary for reproducibility.
Experiment Setup Yes For PROST we use the IPPC2011 setting except that the allocated time per step is explicitly set. For our algorithms, as mentioned above, the number of samples per action a for the estimate of Qπ(s, a) is determined dynamically. The parameter n that controls the number of action samples when aggregating over actions is set to min{10, 0.6 |A|}. The simulation depth in ARollout and the depth of the MCTS tree are set to half of the horizon used in evaluation time. We use ϵ = 0.5 for the ϵ-greedy action choice in all algorithms.