reproducibilityindex.ai

Contrastive Corpus Attribution for Explaining Representations

Authors: Chris Lin, Hugh Chen, Chanwoo Kim, Su-In Lee

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we quantitatively evaluate whether important features identified by COCOA are indeed related to the corpus and foil, using multiple datasets, encoders, and feature attribution methods (Section 4.1). Furthermore, we demonstrate the utility of COCOA by its application to understanding image data augmentations (Section 4.2) and to mixed modality object localization (Section 4.3).
Researcher Affiliation	Academia	Chris Lin , Hugh Chen , Chanwoo Kim, Su-In Lee Paul G. Allen School of Computer Science & Engineering University of Washington Seattle, WA 98195, USA {clin25,hughchen,chanwkim,suinlee}@cs.washington.edu
Pseudocode	No	The paper presents mathematical definitions and propositions but does not include any blocks explicitly labeled as “Pseudocode” or “Algorithm”.
Open Source Code	Yes	Code is available at https://github.com/suinleelab/cl-explainability.
Open Datasets	Yes	To evaluate COCOA across different representation learning models and datasets, we apply them to (i) Sim CLR, a contrastive self-supervised model (Chen et al., 2020), trained on Image Net (Russakovsky et al., 2015); (ii) Sim Siam, a non-contrastive self-supervised model (Chen & He, 2021), trained on CIFAR-10 (Krizhevsky et al., 2009); and (iii) representations extracted from the penultimate layer of a Res Net18 (He et al., 2016), trained on the abnormal-vs.-normal musculoskeletal X-ray dataset MURA (Rajpurkar et al., 2017).
Dataset Splits	Yes	Image Net. The Image Net ILSVRC dataset contains 1.2 million labeled training images and 50,000 labeled validation images over 1,000 object classes (Russakovsky et al., 2015). Since the Sim CLR model and its downstream linear classifier are trained and tuned with only the training set, we use the validation set as a held-out set for quantitative evaluation (Section 4.1) and understanding data augmentations (Section 4.2). MURA. The MURA (MUsculoskeletal RAdiographs) dataset contains 36,808 training radiograph images from 13,457 musculoskeletal studies and 3,197 validation images from 1,199 studies. Each study and its associated images are labeled by radiologists as either normal or abnormal. We further split the official training images into an 80% subset for our model training and a 20% subset for hyperparameter tuning. The official validation set is held out and used for quantitative evaluation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for the experiments. It mentions model architectures like ResNet50 and ResNet18, but these are not hardware specifications.
Software Dependencies	No	The paper mentions using the Captum package (Kokhlikyan et al., 2020) and that SimCLR models were converted to PyTorch format, but it does not provide specific version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup	Yes	Sim Siam: The downstream linear classifier was trained with stochastic gradient descent with a learning rate of 30.0 and a batch size of 256. Res Net classifier: The classifier was trained with an Adam optimizer (Kingma & Ba, 2014), with a batch size of 256, a learning rate decay of 0.1 every 10 steps, and a weight decay of 0.001. The initial learning rate was tuned over {0.1, 0.01, 0.001, 0.0001} to identify the optimal initial learning rate of 0.001. Integrated Gradients: 50 steps for the Riemman approximation. Gradient SHAP: 50 random Gaussian noise samples having a standard deviation of 0.2. RISE: 5000 random masks generated with a masking probability of 0.5. For CIFAR-10, each binary mask before upsampling had size 4x4. For Image Net and MURA, each initial binary mask was 7x7. For our experiments, 100 training samples are randomly selected to be the corpus set for each class in CIFAR-10 and MURA. For Image Net, 100 corpus samples are drawn from each of the 10 classes from Image Nette... Foil samples are randomly drawn from each training set. Because Res Nets output non-negative representations from Re LU activations, we can apply Equation (C.3) and set the foil size to 1,500 based on a δ = 0.01 and ε = 0.05 in Proposition C.2. For CLIP experiments, 20,000 masks were used for RISE.