reproducibilityindex.ai

Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

Authors: Yen-Cheng Liu, CHIH-YAO MA, Junjiao Tian, Zijian He, Zsolt Kira

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results on four different dense vision tasks showed that existing methods cannot be efficiently integrated due to the hierarchical nature of the Hierarchical Vision Transformers. To overcome this issue, we propose Polyhistor and Polyhistor-Lite, consisting of Decomposed Hyper Networks and Layer-wise Scaling Kernels, to share information across different tasks with a few trainable parameters. This leads to favorable performance improvements against existing parameter-efficient methods while using fewer trainable parameters. Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only use 10% of their trainable parameters.
Researcher Affiliation	Collaboration	Yen-Cheng Liu Georgia Tech ycliu@gatech.edu Chih-Yao Ma Meta cyma@meta.com Junjiao Tian Georgia Tech jtian73@gatech.edu Zijian He Meta zijian@meta.com Zsolt Kira Georgia Tech zkira@gatech.edu
Pseudocode	No	The paper does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	We will publicly release our code to facilitate future research.
Open Datasets	Yes	Dataset. We follow prior works [30, 31] on multi-task learning for dense prediction tasks and consider PASCAL-Context [32] to construct our multi-task efficient adaptation for per-pixel benchmark.
Dataset Splits	No	The paper mentions training data and evaluation metrics, but it does not explicitly state the specific train/validation/test dataset splits (e.g., percentages, absolute counts, or references to predefined splits) used for the experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU models, or cloud computing instances.
Software Dependencies	No	The paper does not list specific software dependencies with their version numbers required to reproduce the experiments.
Experiment Setup	Yes	Training. To train our model, we use the commonly-used losses for each task. Specifically, we use the standard per-pixel cross-entropy for semantic segmentation and human part segmentation, L1 loss for surface normals estimation, and balanced cross-entropy for saliency detection. For a fair comparison, we experiment on a unified codebase implementation with the same loss functions and training iterations for all baselines and our method. (...) VPT [9] inserts tunable embeddings in the first input layer (VPT-shallow) and all layers (VPT-deep), and we select the best hyper-parameter (i.e., 50 embeddings per layer) for all results. (...) Lo RA [10] applied the low-rank decomposition on attention layers, and we select rank r = 4 and the adapter output scale (i.e., 4), which performs the best.