reproducibilityindex.ai

TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation

Authors: Pengfei Li, Beiwen Tian, Yongliang Shi, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate TOIST on the large-scale task oriented dataset COCO-Tasks and achieve +10.9% higher m APbox than the best-reported results. The proposed noun-pronoun distillation can boost m APbox and m APmask by +2.8% and +3.8%.
Researcher Affiliation	Collaboration	1AIR, Tsinghua University 2 Peking University 3Intel Labs
Pseudocode	No	The paper includes a network architecture diagram (Figure 2) but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Codes and models are publicly available at https://github.com/AIR-DISCOVER/TOIST.
Open Datasets	Yes	We conduct experiments on the COCO-Tasks dataset [55] which re-annotates the COCO dataset [40] with preference-aware affordance labels. (a) MIT License for the COCOTasks dataset. (b) Creative Commons Attribution 4.0 License for the Microsoft COCO dataset.
Dataset Splits	No	The paper specifies '3600 train images and 900 test images' but does not explicitly mention a validation split for the dataset.
Hardware Specification	Yes	We conduct all experiments on 8 NVIDIA A100 GPUs.
Software Dependencies	Yes	All experiments are implemented in PyTorch [50] and run on Detectron2 framework [50] with Python 3.8 and CUDA 11.3.
Experiment Setup	Yes	The batch size is 16. We use AdamW optimizer with a base learning rate of 1e-4, which is decayed by a factor of 10 at epochs 40 and 50, for a total of 60 epochs. We use a warmup for 500 iterations. We train our models from scratch. The image backbone is ResNet-50 [21], which is pre-trained on ImageNet [50]. The number of object queries is set to 100 and the maximum sequence length of language tokens for DETR-like models is set to 256. The weights of losses in Eq.3 and Eq.9 are λ1=5, λ2=2, λ3=5, λ4=2, λ5=1, λ6=1, λ7=0.1, λ8=1. The K-means cluster number K is set to 3.