TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding
Authors: Hanrong Ye, Dan Xu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two challenging multi-task dense scene understanding benchmarks (i.e. NYUDV2 and PASCAL-Context) show the superiority of the proposed framework and Task Prompter establishes significant state-of-the-art performances on multi-task dense predictions. |
| Researcher Affiliation | Academia | Hanrong Ye and Dan Xu Department of Computer Science and Engineering The Hong Kong University of Science and Technology (HKUST) Clear Water Bay, Kowloon, Hong Kong {hyeae,danxu}@cse.ust.hk |
| Pseudocode | No | The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Codes and models are publicly available at https: //github.com/prismformore/Multi-Task-Transformer. |
| Open Datasets | Yes | Datasets We evaluate the proposed Task Prompter mainly on two mostly used multi-task dense visual scene understanding datasets, i.e. NYUD-v2 (Silberman et al., 2012) and PASCAL-Context (Chen et al., 2014). Details of the datasets are presented in Appendix A.3. To further examine the proposed Task Prompter, we adapt it to tackle a joint 2D-3D multi-task scene understanding problem involving three challenging tasks, i.e. 3D object detection (3Ddet), semantic segmentation (Semseg), and monocular depth estimation (Depth), on Cityscapes-3D dataset (G ahlert et al., 2020). |
| Dataset Splits | Yes | Specifically, PASCAL-Context provides 4,998 images in the training set and 5,105 images in the testing set. On the other hand, NYUD-v2 totally provides 1,449 images, in which 795 are used for training and the rest 654 for testing. Cityscapes-3D dataset consists of 2,975 training images and 500 validation images with fine annotations. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or specific computing platforms) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and types of losses ('L1 Losses', 'cross-entropy losses') but does not specify version numbers for any programming languages, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | The models for different experiments are trained for 40,000 iterations on all datasets, with a batch size of 4 if not otherwise specified. Adam optimizer is adopted with a learning rate of 2 x 10^-5, and a weight decay rate of 1 x 10^-6. A polynomial learning rate scheduler is used during optimization. For the continuous regression tasks (i.e. Depth and Normal) we use L1 Losses. For the discrete classification tasks (i.e. Semseg, Parsing, Saliency, and Boundary) we use cross-entropy losses for them. The learnable task prompts are randomly initialized with normal distribution (mean=1, std=1). |