TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation

Authors: Pengfei Li, Beiwen Tian, Yongliang Shi, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate TOIST on the large-scale task oriented dataset COCO-Tasks and achieve +10.9% higher m APbox than the best-reported results. The proposed noun-pronoun distillation can boost m APbox and m APmask by +2.8% and +3.8%.
Researcher Affiliation Collaboration 1AIR, Tsinghua University 2 Peking University 3Intel Labs
Pseudocode No The paper includes a network architecture diagram (Figure 2) but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Codes and models are publicly available at https://github.com/AIR-DISCOVER/TOIST.
Open Datasets Yes We conduct experiments on the COCO-Tasks dataset [55] which re-annotates the COCO dataset [40] with preference-aware affordance labels. (a) MIT License for the COCOTasks dataset. (b) Creative Commons Attribution 4.0 License for the Microsoft COCO dataset.
Dataset Splits No The paper specifies '3600 train images and 900 test images' but does not explicitly mention a validation split for the dataset.
Hardware Specification Yes We conduct all experiments on 8 NVIDIA A100 GPUs.
Software Dependencies Yes All experiments are implemented in PyTorch [50] and run on Detectron2 framework [50] with Python 3.8 and CUDA 11.3.
Experiment Setup Yes The batch size is 16. We use AdamW optimizer with a base learning rate of 1e-4, which is decayed by a factor of 10 at epochs 40 and 50, for a total of 60 epochs. We use a warmup for 500 iterations. We train our models from scratch. The image backbone is ResNet-50 [21], which is pre-trained on ImageNet [50]. The number of object queries is set to 100 and the maximum sequence length of language tokens for DETR-like models is set to 256. The weights of losses in Eq.3 and Eq.9 are λ1=5, λ2=2, λ3=5, λ4=2, λ5=1, λ6=1, λ7=0.1, λ8=1. The K-means cluster number K is set to 3.