Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CAE v2: Context Autoencoder with CLIP Latent Alignment

Authors: Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, Jian Wang, Xiaodi Wang, Shumin Han, Xiaokang Chen, Jimin Pi, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We pretrain CAE v2 on Image Net-1K images and evaluate on various downstream vision tasks, including image classification, semantic segmentation, object detection and instance segmentation. Experiments show that our CAE v2 achieves competitive performance and even outperforms the CLIP vision encoder, demonstrating the effectiveness of our method. Code is available at https://github.com/Atten4Vis/CAE.
Researcher Affiliation	Collaboration	1Baidu VIS 2School of Automation Science and Electrical Engineering, Beihang University 3College of Computer Science and Technology, Zhejiang University 4Peking University
Pseudocode	No	The paper describes the model architecture and objective functions in text and with computational graphs (Figure 5 in Appendix), but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Atten4Vis/CAE.
Open Datasets	Yes	We pretrain CAE v2 on Image Net-1K images...For image classification, we conduct evaluations on Image Net-1K (Deng et al., 2009)...For semantic segmentation, we follow BEi T (Bao et al., 2022) to use Uper Net (Xiao et al., 2018) and report the m Io U on ADE20K (Zhou et al., 2017) dataset. For objection detection and instance segmentation, we use COCO (Lin et al., 2014) as the evaluation dataset.
Dataset Splits	Yes	We pretrain CAE v2 on Image Net-1K images... For semantic segmentation, we follow the common setting in BEi T (Bao et al., 2022) to use Uper Net (Xiao et al., 2018) and report the m Io U on ADE20K (Zhou et al., 2017) dataset. For objection detection and instance segmentation, we use COCO (Lin et al., 2014) as the evaluation dataset. We adopt both Mask R-CNN (He et al., 2017) and Cascade Mask R-CNN (Cai & Vasconcelos, 2018) frameworks and report APb and APm on the COCO val split.
Hardware Specification	No	The paper provides detailed experimental settings for pretraining, linear probing, fine-tuning, semantic segmentation, object detection, and instance segmentation (Tables 8-12). However, it does not specify any particular hardware components such as GPU models, CPU types, or memory amounts used for these experiments.
Software Dependencies	No	The paper mentions optimizers like Adam W (Loshchilov & Hutter, 2019) and LARS (You et al., 2017), and refers to the official pretrained CLIP model from OpenAI's GitHub. However, it does not specify version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries/packages used in the implementation.
Experiment Setup	Yes	Table 8: Pretraining setting for CAE v2 on Image Net-1K. Table 9: Linear probing setting for CAE v2 on Image Net-1K. Table 10: Fine-tuning setting for CAE v2 on Image Net-1K. Table 11: Semantic segmentation setting for CAE v2 on ADE20K. Table 12: Object detection and instance segmentation setting for CAE v2 on COCO.