What type of inference is planning?
Authors: Miguel Lazaro-Gredilla, Li Ku, Kevin P. Murphy, Dileep George
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate these results empirically on synthetic MDPs and tasks posed in the International Planning Competition. |
| Researcher Affiliation | Industry | Miguel Lázaro-Gredilla Li Yang Ku Kevin P. Murphy Dileep George Google Deepmind {lazarogredilla, liyangku, kpmurphy, dileepgeorge}@google.com |
| Pseudocode | No | The paper provides mathematical derivations for message updates but does not include a distinct pseudocode or algorithm block. |
| Open Source Code | Yes | Code at https://github.com/google-deepmind/what_type_of_inference_is_planning. |
| Open Datasets | Yes | We use the 6 different domains from IPPC2011, each with 10 instances (factored MDPs)... |
| Dataset Splits | No | No explicit training, validation, or test dataset splits are provided. The paper evaluates on synthetic MDPs and standard competition instances, but does not describe data splitting for training/validation in a supervised learning context. |
| Hardware Specification | No | The paper mentions 'CPU machines in the cloud' and specifies the number of 'virtual cores' for experiments (e.g., '32 virtual cores', '2 virtual cores'), but does not provide specific CPU models, memory details, or cloud instance types. |
| Software Dependencies | Yes | The Variational Inference Linear Programming (VI LP) approach uses the GLOP solver in Google s OR-Tools (Perron and Furnon, 2024) to solve the linear programming (LP) problem derived from each task instance with the target of maximizing the expected accumulated reward. |
| Experiment Setup | Yes | For all inference approaches, we run with a look ahead horizon of both 4 and 9. ... The maximum number of iterations is set to 100 and the convergence threshold is set to 0.1 for the EM algorithm. ... The search depth is set to 9 or 4 based on the look ahead horizon. The number of gradient updates is set to 500 following the experimental setting in Wu and Khardon, 2022. The allowed time is set to 50000 per iteration... For each time step, VBP messages are propagated concurrently for a maximum of 150K iterations with 0.1 damping. The ϵ value is annealed every 300 iterations from a value of 1 to 0.01 based on the formula described in Appendix D. |