Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition
Authors: Dat Huynh, Ehsan Elhamifar
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on four popular datasets of Deep Fashion, AWA2, CUB, and SUN, showing that our method significantly improves the state of the art. |
| Researcher Affiliation | Academia | Dat Huynh Northeastern University huynh.dat@northeastern.edu Ehsan Elhamifar Northeastern University eelhami@ccs.neu.edu |
| Pseudocode | Yes | Algorithm 1 Composing Dense Features |
| Open Source Code | No | The paper does not provide concrete access to source code for the described methodology. |
| Open Datasets | Yes | We conduct experiments on four popular datasets: Deep Fashion [4], AWA2 [68], CUB [69], and SUN [70]. |
| Dataset Splits | Yes | We follow the data splits of [2] for Deep Fashion and of [68] for AWA2, CUB, and SUN. |
| Hardware Specification | Yes | We implement our framework in Py Torch and optimize it using RMSprop[74] with the default setting, learning rate of 0.0001 and batch size of 50 having an equal number of samples per class. We pre-train DAZLE on seen classes and use it to compose dense features for at most 2000 and 4000 iterations, respectively, on a NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'RMSprop' but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We implement our framework in Py Torch and optimize it using RMSprop[74] with the default setting, learning rate of 0.0001 and batch size of 50 having an equal number of samples per class. We pre-train DAZLE on seen classes and use it to compose dense features for at most 2000 and 4000 iterations, respectively... To prevent seen class bias, we add a margin of 1 to unseen class scores and 1 to seen class scores... We experiment in two settings: i) using pre-trained Image Net features (pre-trained setting) and ii) fine-tuning the Res Net backbone on each dataset... We use the feature map of the last convolutional layer whose size is 7 7 2048... To measure the robustness of our method, we fix the hyperparameters at T = 5, k = 5, b = 50 (T = 10, k = 10, b = 50) for the pretrained (fine-tuned) setting on all datasets. |