Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

Authors: Tongzhou Mu, Jiayuan Gu, Zhiwei Jia, Hao Tang, Hao Su

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate our approach on four difficult tasks that require compositional generalizability, and achieve superior performance compared to baselines.
Researcher Affiliation Academia 1University of California, San Diego 2Shanghai Jiao Tong University
Pseudocode No The paper describes its methods in text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Project website: https://jiayuan-gu.github.io/policy-refactorization.
Open Datasets Yes We start from evaluating the basic units, the SPACE object detector and the object-centric GNN on Multi-MNIST. After the units are verified, then, we evaluate the effectiveness of our framework for two types of compositional generalizability: w.r.t. the change of object quantity (Falling Digit), and w.r.t. the change of background (Big Fish). Finally, we show that there exist environments, e.g., Pacman, in which a generalizable student policy does not have to be object-centric GNNs. and The training set consists of 60000 images and each image has 1 to 3 MNIST digits, while the the test set consists of 10000 images with 4 MNIST digits. and [8] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Image Net: A Large-Scale Hierarchical Image Database. In CVPR09, 2009. and [19] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
Dataset Splits No The paper describes training and test sets for its experiments (e.g., 'The training set consists of 60000 images... while the the test set consists of 10000 images...'), but does not explicitly mention a validation set or specific split for it.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or specific cloud instance types used for running experiments.
Software Dependencies No The paper mentions algorithms and architectures (e.g., 'DQN [24]', 'Point Net [26]'), but does not list specific software dependencies with version numbers, such as Python or deep learning frameworks.
Experiment Setup Yes In this task, we train all the baselines in a supervised learning manner... The node input is a patch cropped from the image according to the bounding box of the corresponding object and resized to 16x16. Then we use a CNN to encode node features, and apply a global-add-pooling to readout a global feature over all the nodes, followed by an MLP to predict the summation. And the policy GNN is implemented as Point Net [26]. ... For our framework, we first train a teacher policy by DQN [24] in the training environment... The architecture of teacher policy is Relation Net [39]. ... we use a complete graph as the object-centric graph. The node input includes the bounding box position and a patch cropped from the image according to the bounding box, which is resized to 16 16. The policy GNN is implemented as Edge Conv [36]. ... we use PPO[30] to train a CNN-based policy network. ... trained by PPO for 200M frames.