Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding
Authors: Dong Jing, Xiaolong He, Yutian Luo, Nanyi Fei, guoxing Yang, Wei Wei, Huiwen Zhao, Zhiwu Lu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on challenging dense prediction and image-level tasks. [...] Through extensive experimental evaluations, we show that Fine CLIP surpasses previous arts on most dense prediction tasks and image-level tasks under fair comparison settings, demonstrating its effectiveness in both fine-grained understanding and semantic-aligned global representation. |
| Researcher Affiliation | Collaboration | 1Gaoling School of Artificial Intelligence, Renmin University of China 2Meta Brain AGI Lab, Shanghai, China 3R&D Management Department, Honor Device Co., Ltd EMAIL |
| Pseudocode | No | The paper does not contain any blocks explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | We will release the code and generated textual descriptions of regions soon. |
| Open Datasets | Yes | we train Fine CLIP using 8 A800 GPUs on train2017 split of COCO dataset [30], which includes approximately 118K human-annotated image-text pairs along with 970K region-label pairs. |
| Dataset Splits | Yes | Using the COCO val2017 split, we test Fine CLIP designs on the box classification task with pooled region features and image-level retrieval tasks using global embeddings. |
| Hardware Specification | Yes | we train Fine CLIP using 8 A800 GPUs on train2017 split of COCO dataset |
| Software Dependencies | No | The paper lists various software components and models (e.g., BERT, ViT, AdamW, BLIP-2, YOLOv9, PyTorch) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | We train Fine CLIP for 10 epochs using Adam W [32] optimizer with the batch size of 32 per GPU, the learning rate of 1e 5, and the weight decay of 0.1. The coefficients λ and γ in learning objective are both set to 1. In all experiments, we freeze the language encoder L to reduce computational overheads and improve training stability. |