What type of inference is planning?

Authors: Miguel Lazaro-Gredilla, Li Ku, Kevin P. Murphy, Dileep George

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate these results empirically on synthetic MDPs and tasks posed in the International Planning Competition.
Researcher Affiliation Industry Miguel Lázaro-Gredilla Li Yang Ku Kevin P. Murphy Dileep George Google Deepmind {lazarogredilla, liyangku, kpmurphy, dileepgeorge}@google.com
Pseudocode No The paper provides mathematical derivations for message updates but does not include a distinct pseudocode or algorithm block.
Open Source Code Yes Code at https://github.com/google-deepmind/what_type_of_inference_is_planning.
Open Datasets Yes We use the 6 different domains from IPPC2011, each with 10 instances (factored MDPs)...
Dataset Splits No No explicit training, validation, or test dataset splits are provided. The paper evaluates on synthetic MDPs and standard competition instances, but does not describe data splitting for training/validation in a supervised learning context.
Hardware Specification No The paper mentions 'CPU machines in the cloud' and specifies the number of 'virtual cores' for experiments (e.g., '32 virtual cores', '2 virtual cores'), but does not provide specific CPU models, memory details, or cloud instance types.
Software Dependencies Yes The Variational Inference Linear Programming (VI LP) approach uses the GLOP solver in Google s OR-Tools (Perron and Furnon, 2024) to solve the linear programming (LP) problem derived from each task instance with the target of maximizing the expected accumulated reward.
Experiment Setup Yes For all inference approaches, we run with a look ahead horizon of both 4 and 9. ... The maximum number of iterations is set to 100 and the convergence threshold is set to 0.1 for the EM algorithm. ... The search depth is set to 9 or 4 based on the look ahead horizon. The number of gradient updates is set to 500 following the experimental setting in Wu and Khardon, 2022. The allowed time is set to 50000 per iteration... For each time step, VBP messages are propagated concurrently for a maximum of 150K iterations with 0.1 damping. The ϵ value is annealed every 300 iterations from a value of 1 to 0.01 based on the formula described in Appendix D.