Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Authors: Yunheng Li, Zhong-Yu Li, Quan-Sheng Zeng, Qibin Hou, Ming-Ming Cheng

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our simple Cascade-CLIP achieves superior zero-shot performance on segmentation benchmarks, like COCO-Stuff, Pascal-VOC, and Pascal-Context. Our code is available at https://github. com/HVision-NKU/Cascade-CLIP. 4. Experiments 4.1. Datasets and Evaluation Metrics 4.2. Implementation Details 4.3. Comparisons with the State-of-the-art Methods 4.4. Ablation Study 4.5. Extending Cascade-CLIP to Other Methods
Researcher Affiliation	Academia	1VCIP, School of Computer Science, Nankai University 2Nankai International Advanced Research Institute (Shenzhen Futian). Correspondence to: Qibin Hou <EMAIL>.
Pseudocode	No	The paper describes the proposed framework and components using textual descriptions and diagrams (e.g., Figure 2, Figure 3), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github. com/HVision-NKU/Cascade-CLIP.
Open Datasets	Yes	To evaluate the effectiveness of our proposed method, we perform extensive experiments on three widely used benchmark datasets, including COCO-Stuff (Caesar et al., 2018), Pascal-VOC (Everingham et al., 2015), and Pascal Context (Mottaghi et al., 2014). ... COCO-Stuff is an extensive semantic segmentation dataset comprising 171 categories... It contains 117k training images and 5k validation images... PASCAL VOC consists of 11,185 training images and 1,449 validation images... PASCAL Context provides supplementary annotations for PASCAL VOC 2010, consisting of 4,998 training images and 5,005 validation images.
Dataset Splits	Yes	COCO-Stuff is an extensive semantic segmentation dataset comprising 171 categories... It contains 117k training images and 5k validation images and it is divided into 156 seen classes and 15 unseen classes. ... PASCAL VOC consists of 11,185 training images and 1,449 validation images across 20 classes. ... PASCAL Context provides supplementary annotations for PASCAL VOC 2010, consisting of 4,998 training images and 5,005 validation images.
Hardware Specification	Yes	We implement the proposed method on the open-source toolbox MMSegmentation (Contributors, 2020) and conduct all experiments using a machine with 4 NVIDIA RTX 3090 GPUs. ... All models are evaluated on a single 3090 GPU.
Software Dependencies	No	We implement the proposed method on the open-source toolbox MMSegmentation (Contributors, 2020)... While MMSegmentation is mentioned, no specific version number for it or other core software dependencies (like Python, PyTorch, or CUDA) is provided.
Experiment Setup	Yes	The batch size on each GPU is set to 4, and the input image resolution is 512 × 512. The optimizer is Adam W (Loshchilov & Hutter, 2019) with the default training schedule in the MMSeg toolbox. For a fair comparison, we use the same number of training iterations on each dataset as Zeg CLIP (Zhou et al., 2023). ... The objective loss function Lpixel is defined as: Lpixel = αLdice(Y, M) + βLfocal(Y, M)... {α, β} are two weights with the default values of {1, 100}, respectively.