Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SoftCLIP: Softer Cross-Modal Alignment Makes CLIP Stronger

Authors: Yuting Gao, Jinfeng Liu, Zihan Xu, Tong Wu, Enwei Zhang, Ke Li, Jie Yang, Wei Liu, Xing Sun

AAAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of Soft CLIP.
Researcher Affiliation	Collaboration	1Tencent Youtu Lab 2Department of Automation, Shanghai Jiao Tong University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. Figure 2 shows an overall framework, not a detailed algorithm.
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	And Soft CLIP is pre-trained on three datasets, CC3M (Changpinyo et al. 2021), CC12M (Sharma et al. 2018) and YFCC15M-V2 (Li et al. 2021b). These datasets are listed in Table 1.
Dataset Splits	No	The paper mentions training for a certain number of epochs and using 'automatic mixed-precision', but it does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning into train/validation/test sets.
Hardware Specification	Yes	We use 8 V100 GPUs for experiments
Software Dependencies	No	The paper mentions the use of AdamW optimizer and automatic mixed-precision but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The input resolution of image encoder is 224 224 and the maximum context length of text encoder is 77. ... We train our Soft CLIP using an Adam W (Loshchilov and Hutter 2017) optimizer and the cosine learning rate scheduler with a linear warm-up. Specifically, the learning rate linearly increases from 0 to the peak value within 10% of the total steps, and then decreases with a cosine anneal strategy. The weight decay rate of Adam W is set to 0.2. ... The models are trained from scratch for either 8 or 32 epochs in our experiments, i.e., 8 epochs for ablation and 32 epochs for comparison. ... the batch size is set to 2048, while with the image encoder Vi T-B/16, the batch size is 1024.