reproducibilityindex.ai

Synthesizing Programmatic Policy for Generalization within Task Domain

Authors: Tianyi Wu, Liwei Shen, Zhen Dong, Xin Peng, Wenyun Zhao

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach in benchmarks, adapted from PDDLGym for task planning and Pybullet for robotic manipulation. Experimental results showcase the effectiveness of our approach across diverse benchmarks. Moreover, the learned policy demonstrates the ability to generalize to tasks that were not seen during training.
Researcher Affiliation	Academia	Tianyi Wu , Liwei Shen , Zhen Dong , Xin Peng and Wenyun Zhao Fudan University {tywu18, shenliwei, zhendong, pengxin, wyzhao}@fudan.edu.cn
Pseudocode	Yes	Algorithm 1: Algorithm for training programmatic policy
Open Source Code	Yes	Code and benchmarks: https://github.com/V0idwu/ meta-prl-code
Open Datasets	Yes	One group comprises three benchmarks Hanoi, Stacking and Hiking adapted from [Silver and Chitnis, 2020], where the action space is discrete. These benchmarks primarily serve to evaluate our approach in the context of task planning. Another group comprises four tasks which are Panda Reach, Panda Push, Panda Slide, Panda Stack. These benchmarks are developed within the Pybullet environment [Gallou edec et al., 2021].
Dataset Splits	No	The paper does not explicitly provide details about training/validation/test dataset splits (e.g., '80/10/10 split', specific sample counts for each split, or references to predefined validation splits).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software components and algorithms used (e.g., 'Proximal Policy Optimization [Schulman et al., 2017]', 'Reptile [Nichol and Schulman, 2018]'), but does not provide specific version numbers for these or other ancillary software dependencies required for replication.
Experiment Setup	Yes	Algorithm 1: Algorithm for training programmatic policy, Input: Distribution over tasks p(TH), Learning rate α, Meta Learning rate β, DSL E, Depth d. The threshold for the maximum number of agent-environment interactions is set to 500. Results are averaged over 10 random seeds.