reproducibilityindex.ai

PG3: Policy-Guided Planning for Generalized Policy Generation

Authors: Ryan Yang, Tom Silver, Aidan Curtis, Tomas Lozano-Perez, Leslie Kaelbling

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines.
Researcher Affiliation	Academia	Ryan Yang , Tom Silver , Aidan Curtis , Tomas Lozano-Perez and Leslie Kaelbling Massachusetts Institute of Technology {ryanyang, tslvr, curtisa}@mit.edu, {tlp, lpk}@csail.mit.edu
Pseudocode	Yes	Algorithm 1 Generalized Policy Search via GBFS; Algorithm 2 Score Function for Policy Search; Algorithm 3 Policy-Guided Planning; Algorithm 4 Single Plan Comparison
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository for their methodology.
Open Datasets	No	All experimental results are over 10 random seeds, where training and test problem instances are randomly generated for each seed. The paper does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available or open dataset.
Dataset Splits	No	The paper mentions 'training problems' and 'held-out test problems' but does not explicitly describe a separate 'validation' dataset or its split.
Hardware Specification	No	The paper states 'PG3 (implemented in Python) can quickly learn policies that generalize to large test problems' but does not provide specific details about the hardware used, such as CPU/GPU models or cloud configurations.
Software Dependencies	No	The paper mentions 'PG3 (implemented in Python)' but does not provide specific version numbers for Python or any other key software libraries or solvers used in their experiments.
Experiment Setup	Yes	Our implementation of policy-guided planning is a small modification to A search: for each search node that is popped from the queue, in addition to expanding the standard single-step successors, we roll out the policy for several time steps (maximum 50 in experiments...). All GPS methods are run for 2500 node expansions.