DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
Authors: Yash Jain, Harkirat Behl, Zsolt Kira, Vibhav Vineet
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Universal Object-Detection Benchmark show that we outperform the existing state-of-the-art by average +10.2 AP score and improve over our non-Mo E baseline by average +2.0 AP score. |
| Researcher Affiliation | Collaboration | Yash Jain1 Harkirat Behl2 Zsolt Kira1 Vibhav Vineet2 1Georgia Institute of Technology 2Microsoft Research |
| Pseudocode | No | The paper describes methods using mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/jinga-lala/DAMEX. |
| Open Datasets | Yes | UODB comprises of 11 datasets: Pascal VOC [5], Wider Face [40], KITTI [8], LISA [26], DOTA [36], COCO [22], Watercolor, Clipart, Comic [13], Kitchen [9] and Deep Lesions [38], shown in Figure 1. |
| Dataset Splits | Yes | All the reported numbers in this work are mean Average Precision (AP) scores evaluated on the available test or val set of corresponding dataset. |
| Hardware Specification | Yes | We kept one expert per GPU and train on 8 RTX6000 GPUs with a batch-size of 2 per GPU, unless mentioned otherwise. |
| Software Dependencies | No | The paper mentions using the 'TUTEL library [12]' but does not provide specific version numbers for this or any other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For hyper-parameters, as in DINO, we use a 6-layer Transformer encoder and a 6-layer Transformer decoder and 256 as the hidden feature dimension. We use a capacity factor f of 1.25 and an auxiliary expert-balancing loss weight of 0.1 with top-1 selection of experts. We use a learning-rate of 1.4e-4 and kept other DINO-specific hyperparameters same as [42]. |