Reliability of CKA as a Similarity Measure in Deep Learning

Authors: MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, Eugene Belilovsky

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Sec. 4 we empirically analyze CKA s reliability, illustrating our theoretical results and subsequently presenting a general optimization procedure that allows the CKA value to be heavily manipulated to be either high or low without significant changes to the functional behaviour of the underlying ANNs. We use this to revisit previous findings (Nguyen et al., 2021; Kornblith et al., 2019).
Researcher Affiliation Academia Mohammad Reza Davari 1,3 Stefan Horoi 2,3 Amine Natik 2,3 Guillaume Lajoie 2,3 Guy Wolf 2,3 Eugene Belilovsky 1,3 1 Concordia University 2 Université de Montréal 3 Mila Quebec AI Institute
Pseudocode Yes Algo. 1 shows the pseudo code of the dynamical scaling of the λ loss balance parameter seen in Eq. 3.
Open Source Code No The paper does not provide an explicit statement about releasing the source code or a link to a code repository for the methodology described.
Open Datasets Yes trained to generalize on the CIFAR10 image classification task (Krizhevsky et al., 2009)
Dataset Splits No The paper mentions using a "validation set accuracy" as a surrogate metric and refers to "CIFAR10 training set" and "CIFAR10 test set", but does not provide explicit split percentages or detailed methodology for splitting the data into train/validation/test sets for reproduction.
Hardware Specification No This work is also supported by resources from Compute Canada and Calcul Quebec. This is a general mention of computing resources and does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts).
Software Dependencies No The paper mentions optimizers and activation functions used (e.g., 'Adam W', 'ReLU', 'Batch Normalization') but does not specify versions for core software libraries or programming languages like Python, PyTorch, or TensorFlow.
Experiment Setup Yes The models in Sec. 4.1, both the generalized and memorized network, were trained for 100 epochs using Adam W (Loshchilov & Hutter, 2017) optimizer with a learning rate (LR) of 1e-3 and a weight decay of 5e-4. The LR is follows cosine LR scheduler (Loshchilov & Hutter, 2016) with an initial LR stated earlier.