PG3: Policy-Guided Planning for Generalized Policy Generation
Authors: Ryan Yang, Tom Silver, Aidan Curtis, Tomas Lozano-Perez, Leslie Kaelbling
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines. |
| Researcher Affiliation | Academia | Ryan Yang , Tom Silver , Aidan Curtis , Tomas Lozano-Perez and Leslie Kaelbling Massachusetts Institute of Technology {ryanyang, tslvr, curtisa}@mit.edu, {tlp, lpk}@csail.mit.edu |
| Pseudocode | Yes | Algorithm 1 Generalized Policy Search via GBFS; Algorithm 2 Score Function for Policy Search; Algorithm 3 Policy-Guided Planning; Algorithm 4 Single Plan Comparison |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository for their methodology. |
| Open Datasets | No | All experimental results are over 10 random seeds, where training and test problem instances are randomly generated for each seed. The paper does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions 'training problems' and 'held-out test problems' but does not explicitly describe a separate 'validation' dataset or its split. |
| Hardware Specification | No | The paper states 'PG3 (implemented in Python) can quickly learn policies that generalize to large test problems' but does not provide specific details about the hardware used, such as CPU/GPU models or cloud configurations. |
| Software Dependencies | No | The paper mentions 'PG3 (implemented in Python)' but does not provide specific version numbers for Python or any other key software libraries or solvers used in their experiments. |
| Experiment Setup | Yes | Our implementation of policy-guided planning is a small modification to A search: for each search node that is popped from the queue, in addition to expanding the standard single-step successors, we roll out the policy for several time steps (maximum 50 in experiments...). All GPS methods are run for 2500 node expansions. |