Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models
Authors: Xin Li, Sima Behpour, Thang Long Doan, Wenbin He, Liang Gou, Liu Ren
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively compare our method with the state-of-the-art using seven benchmark datasets in different settings, achieving up to a performance gain of 20%. |
| Researcher Affiliation | Industry | Bosch Research North America, Bosch Center for Artificial Intelligence (BCAI) EMAIL |
| Pseudocode | No | The paper mentions 'Algorithm 1' in the text ('The optimization is a one-stage and end-to-end process, shown in Algorithm 1.'), but an actual pseudocode or algorithm block labeled 'Algorithm 1' is not present in the provided text. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | We select seven image classification datasets that are widely used in evaluating the V-L model adaptation approach. These datasets constitute a comprehensive benchmark, covering a diverse set of vision tasks, including the classification of generic objects (Caltech101 [14]), actions (UCF101 [36]), fine-grained categories (Oxford Pets [29], FGVCAircraft [26], and Flowers102 [27]), as well as some specialized tasks such as recognizing texture (DTD [8]) and satellite imagery (Euro SAT [18]). |
| Dataset Splits | Yes | Table 8: Datasets Statistics. The detailed statistics of the 7 datasets and the hand-crafted prompts that are used for BLIP-2 zero-shot learning. |
| Hardware Specification | Yes | We optimize our model with a batch size of 256 for a total of 150 epochs on RTX 3090. |
| Software Dependencies | No | The paper mentions software components like BLIP-2, CLIP, and DINOv2 models, and optimizers such as Adam, but it does not specify any version numbers for these or other software libraries or frameworks (e.g., PyTorch, TensorFlow, Python versions). |
| Experiment Setup | Yes | Training Details For the base model, we use the best available vision backbone in BLIP-2, which is Vi T-G. Previous work [48] on prompt learning has shown that a shorter context length can lead to better and more robust performance. Therefore, we initialize the context vectors with a fixed length of 4. The two hyperparameters, τI and τC, are set to 0.5 and 1.0, respectively. Training is performed with the Adam optimizer and a learning rate of 0.0003. We optimize our model with a batch size of 256 for a total of 150 epochs on RTX 3090. |