TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation
Authors: Pengfei Li, Beiwen Tian, Yongliang Shi, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Ya-Qin Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TOIST on the large-scale task oriented dataset COCO-Tasks and achieve +10.9% higher m APbox than the best-reported results. The proposed noun-pronoun distillation can boost m APbox and m APmask by +2.8% and +3.8%. |
| Researcher Affiliation | Collaboration | 1AIR, Tsinghua University 2 Peking University 3Intel Labs |
| Pseudocode | No | The paper includes a network architecture diagram (Figure 2) but does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes and models are publicly available at https://github.com/AIR-DISCOVER/TOIST. |
| Open Datasets | Yes | We conduct experiments on the COCO-Tasks dataset [55] which re-annotates the COCO dataset [40] with preference-aware affordance labels. (a) MIT License for the COCOTasks dataset. (b) Creative Commons Attribution 4.0 License for the Microsoft COCO dataset. |
| Dataset Splits | No | The paper specifies '3600 train images and 900 test images' but does not explicitly mention a validation split for the dataset. |
| Hardware Specification | Yes | We conduct all experiments on 8 NVIDIA A100 GPUs. |
| Software Dependencies | Yes | All experiments are implemented in PyTorch [50] and run on Detectron2 framework [50] with Python 3.8 and CUDA 11.3. |
| Experiment Setup | Yes | The batch size is 16. We use AdamW optimizer with a base learning rate of 1e-4, which is decayed by a factor of 10 at epochs 40 and 50, for a total of 60 epochs. We use a warmup for 500 iterations. We train our models from scratch. The image backbone is ResNet-50 [21], which is pre-trained on ImageNet [50]. The number of object queries is set to 100 and the maximum sequence length of language tokens for DETR-like models is set to 256. The weights of losses in Eq.3 and Eq.9 are λ1=5, λ2=2, λ3=5, λ4=2, λ5=1, λ6=1, λ7=0.1, λ8=1. The K-means cluster number K is set to 3. |