DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations
Authors: Ximeng Sun, Ping Hu, Kate Saenko
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the advantages of our approach over state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1Boston University, 2MIT-IBM Watson AI Lab, IBM Research {sunxm, pinghu, saenko}@bu.edu |
| Pseudocode | No | The paper describes its methods using text and mathematical equations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: https://cs-people.bu.edu/sunxm/ Dual Co Op/project.html [...] Please check our project page included in the abstract and https://github.com/sunxm2357/Dual Co Op for our code. |
| Open Datasets | Yes | Datasets. We conduct experiments on MS-COCO [34], VOC2007 [16] and Big Earth [5] to evaluate multi-label recognition with partial labels. |
| Dataset Splits | Yes | MS-COCO [34] contains 80 common object categories and we use the official train2014 (82K images) and val2014 (40K images) splits for training and test. |
| Hardware Specification | Yes | Training is done with one RTX A6000. [...] We compare the computational cost between Dual Co Op and SARB [46] in terms of training/testing latency and memory (see Table 3) using the same device (one Nividia A100 GPU). |
| Software Dependencies | No | The paper mentions models and frameworks (e.g., ResNet-101, CLIP, Transformer, SGD optimizer, ASL loss) but does not provide specific version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | For each class/label, we learn two independent context vectors with 16 context tokens (N = 16) following [71], which is the only learnable part in Dual Co Op. We use the SGD optimizer with an initial rate of 0.002 which is decayed by the cosine annealing rule. We train context vectors for 50 epochs with a batch-size 32/8/32 for MS-COCO/VOC2007/Big Earth, respectively. For ASL loss, we choose γ+ = 1, γ = 2 and c = 0.05 via validation. |