Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
CAE v2: Context Autoencoder with CLIP Latent Alignment
Authors: Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, Jian Wang, Xiaodi Wang, Shumin Han, Xiaokang Chen, Jimin Pi, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We pretrain CAE v2 on Image Net-1K images and evaluate on various downstream vision tasks, including image classification, semantic segmentation, object detection and instance segmentation. Experiments show that our CAE v2 achieves competitive performance and even outperforms the CLIP vision encoder, demonstrating the effectiveness of our method. Code is available at https://github.com/Atten4Vis/CAE. |
| Researcher Affiliation | Collaboration | 1Baidu VIS 2School of Automation Science and Electrical Engineering, Beihang University 3College of Computer Science and Technology, Zhejiang University 4Peking University |
| Pseudocode | No | The paper describes the model architecture and objective functions in text and with computational graphs (Figure 5 in Appendix), but it does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Atten4Vis/CAE. |
| Open Datasets | Yes | We pretrain CAE v2 on Image Net-1K images...For image classification, we conduct evaluations on Image Net-1K (Deng et al., 2009)...For semantic segmentation, we follow BEi T (Bao et al., 2022) to use Uper Net (Xiao et al., 2018) and report the m Io U on ADE20K (Zhou et al., 2017) dataset. For objection detection and instance segmentation, we use COCO (Lin et al., 2014) as the evaluation dataset. |
| Dataset Splits | Yes | We pretrain CAE v2 on Image Net-1K images... For semantic segmentation, we follow the common setting in BEi T (Bao et al., 2022) to use Uper Net (Xiao et al., 2018) and report the m Io U on ADE20K (Zhou et al., 2017) dataset. For objection detection and instance segmentation, we use COCO (Lin et al., 2014) as the evaluation dataset. We adopt both Mask R-CNN (He et al., 2017) and Cascade Mask R-CNN (Cai & Vasconcelos, 2018) frameworks and report APb and APm on the COCO val split. |
| Hardware Specification | No | The paper provides detailed experimental settings for pretraining, linear probing, fine-tuning, semantic segmentation, object detection, and instance segmentation (Tables 8-12). However, it does not specify any particular hardware components such as GPU models, CPU types, or memory amounts used for these experiments. |
| Software Dependencies | No | The paper mentions optimizers like Adam W (Loshchilov & Hutter, 2019) and LARS (You et al., 2017), and refers to the official pretrained CLIP model from OpenAI's GitHub. However, it does not specify version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries/packages used in the implementation. |
| Experiment Setup | Yes | Table 8: Pretraining setting for CAE v2 on Image Net-1K. Table 9: Linear probing setting for CAE v2 on Image Net-1K. Table 10: Fine-tuning setting for CAE v2 on Image Net-1K. Table 11: Semantic segmentation setting for CAE v2 on ADE20K. Table 12: Object detection and instance segmentation setting for CAE v2 on COCO. |