Object Pursuit: Building a Space of Objects via Discriminative Weight Generation
Authors: Chuanyu Pan, Yanchao Yang, Kaichun Mo, Yueqi Duan, Leonidas Guibas
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations. Furthermore, we demonstrate the capability of the proposed framework in learning representations that can improve label efficiency in downstream tasks. Our code and trained models are made publicly available at: https://github.com/pptrick/ Object-Pursuit. We also perform experiments on one-shot and few-shot learning, and show the potential of the learned object-centric representations in effectively reducing supervisions for object detection. |
| Researcher Affiliation | Academia | Chuanyu Pan1,*, Yanchao Yang2,*, Kaichun Mo2, Yueqi Duan1,2, Leonidas Guibas2 1Tsinghua University 2Stanford University pancy17@mails.tsinghua.edu.cn {yanchaoy, kaichun, guibas}@cs.stanford.edu duanyueqi@tsinghua.edu.cn |
| Pseudocode | Yes | The proposed object pursuit framework is also summarized in Algorithm. 1. A.1 ALGORITHM Here is the Object Pursuit algorithm we describe in the method section. Algorithm 1: Object Pursuit |
| Open Source Code | Yes | Our code and trained models are made publicly available at: https://github.com/pptrick/ Object-Pursuit. |
| Open Datasets | Yes | To learn diverse objects from variant positions and viewing angles, we collect synthetic data within the i Thor environment ((Kolve et al., 2017)), which provides a set of interactive objects and scenes, as well as accurate modeling of the physics. You Tube-VOS We train and evaluate our framework on the Youtube-VOS dataset, which contains 65 categories. CO3D We also test our framework on CO3D. We perform one-shot learning on the DAVIS 2016 dataset (Perazzi et al., 2016), a video object segmentation dataset in the real scene. |
| Dataset Splits | Yes | The 138 objects are divided into 52 pretraining objects, 60 train objects for the pursuit process, and 25 test unseen objects. For evaluation, we preserve a separate set of 25 objects (unseen test objects) that never appear during training. And we use 27 objects (seen test objects) from the warp-up joint training described above to check the re-identification accuracy. Under the one-shot learning scheme, we fix the hypernet and the bases, initialize the combination coefficient µ with only one training sample (first frame in the sequence). From µ, we can get the representation z for the training object, generate the parameters of a segmentation network using the hypernet, then evaluate the segmentation accuracy. |
| Hardware Specification | No | I did not find any specific hardware details such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti, Tesla V100), CPU models, or cloud computing instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Deeplab v3+' as the segmentation network and 'resnet18' as the backbone, along with the 'dice score' for similarity measure. However, it does not specify software versions for programming languages, libraries (e.g., PyTorch, TensorFlow), or other dependencies. |
| Experiment Setup | Yes | The sparsity constraint α is set to 0.2, 0.1 for Eq. 3 and Eq. 4 respectively, and β = 0.04 for all our experiments. To improve the convergence, we also warm up the hypernetwork using the pretraining objects. During pretraining, each mini-batch contains training data from one object, and we randomly choose which object to use in the next batch. In backpropagation, we update the hypernetwork ψ and representation z for each object. |