reproducibilityindex.ai

What type of inference is planning?

Authors: Miguel Lazaro-Gredilla, Li Ku, Kevin P. Murphy, Dileep George

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate these results empirically on synthetic MDPs and tasks posed in the International Planning Competition.
Researcher Affiliation	Industry	Miguel Lázaro-Gredilla Li Yang Ku Kevin P. Murphy Dileep George Google Deepmind {lazarogredilla, liyangku, kpmurphy, dileepgeorge}@google.com
Pseudocode	No	The paper provides mathematical derivations for message updates but does not include a distinct pseudocode or algorithm block.
Open Source Code	Yes	Code at https://github.com/google-deepmind/what_type_of_inference_is_planning.
Open Datasets	Yes	We use the 6 different domains from IPPC2011, each with 10 instances (factored MDPs)...
Dataset Splits	No	No explicit training, validation, or test dataset splits are provided. The paper evaluates on synthetic MDPs and standard competition instances, but does not describe data splitting for training/validation in a supervised learning context.
Hardware Specification	No	The paper mentions 'CPU machines in the cloud' and specifies the number of 'virtual cores' for experiments (e.g., '32 virtual cores', '2 virtual cores'), but does not provide specific CPU models, memory details, or cloud instance types.
Software Dependencies	Yes	The Variational Inference Linear Programming (VI LP) approach uses the GLOP solver in Google s OR-Tools (Perron and Furnon, 2024) to solve the linear programming (LP) problem derived from each task instance with the target of maximizing the expected accumulated reward.
Experiment Setup	Yes	For all inference approaches, we run with a look ahead horizon of both 4 and 9. ... The maximum number of iterations is set to 100 and the convergence threshold is set to 0.1 for the EM algorithm. ... The search depth is set to 9 or 4 based on the look ahead horizon. The number of gradient updates is set to 500 following the experimental setting in Wu and Khardon, 2022. The allowed time is set to 50000 per iteration... For each time step, VBP messages are propagated concurrently for a maximum of 150K iterations with 0.1 damping. The ϵ value is annealed every 300 iterations from a value of 1 to 0.01 based on the formula described in Appendix D.