Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP
Authors: Qi Qian, Yuanhong Xu, Juhua Hu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on extensive downstream tasks confirm the effectiveness and efficiency of our proposal. |
| Researcher Affiliation | Collaboration | 1 Alibaba Group, Bellevue, WA 98004, USA 2 Alibaba Group, Hangzhou, China 3 School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA |
| Pseudocode | Yes | Algorithm 1 Intra-Modal Proxy Learning (In Ma P) |
| Open Source Code | No | The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | To evaluate the proposed method, we follow the common practice and conduct experiments on Image Net [28] and 13 diverse downstream vision tasks for zero-shot transfer. |
| Dataset Splits | Yes | While the validation set of Image Net is well-balanced with 50 examples for each class, we conduct the experiments with different γ in Eqn. 4 for demonstration. |
| Hardware Specification | Yes | All experiments are conducted on a single V100 GPU. |
| Software Dependencies | No | The paper mentions the use of CLIP and other models, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For the proposed method, the temperature for obtaining pseudo labels with the text proxy τT is set to 0.01 as obtained by CLIP. The temperature for recovering the visual proxy τI is set to 0.04 for all experiments. The intra-modal proxy is learned by standard projected gradient descent, where the initial learning rate is 10 and the number of iterations is 2, 000 for sufficient training. The learning rate will be decayed by 2 when the norm of gradient increases. Sinkhorn distance is optimized by 20 iterations for refining pseudo labels. |