Reliability of CKA as a Similarity Measure in Deep Learning
Authors: MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, Eugene Belilovsky
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Sec. 4 we empirically analyze CKA s reliability, illustrating our theoretical results and subsequently presenting a general optimization procedure that allows the CKA value to be heavily manipulated to be either high or low without significant changes to the functional behaviour of the underlying ANNs. We use this to revisit previous findings (Nguyen et al., 2021; Kornblith et al., 2019). |
| Researcher Affiliation | Academia | Mohammad Reza Davari 1,3 Stefan Horoi 2,3 Amine Natik 2,3 Guillaume Lajoie 2,3 Guy Wolf 2,3 Eugene Belilovsky 1,3 1 Concordia University 2 Université de Montréal 3 Mila Quebec AI Institute |
| Pseudocode | Yes | Algo. 1 shows the pseudo code of the dynamical scaling of the λ loss balance parameter seen in Eq. 3. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | trained to generalize on the CIFAR10 image classification task (Krizhevsky et al., 2009) |
| Dataset Splits | No | The paper mentions using a "validation set accuracy" as a surrogate metric and refers to "CIFAR10 training set" and "CIFAR10 test set", but does not provide explicit split percentages or detailed methodology for splitting the data into train/validation/test sets for reproduction. |
| Hardware Specification | No | This work is also supported by resources from Compute Canada and Calcul Quebec. This is a general mention of computing resources and does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts). |
| Software Dependencies | No | The paper mentions optimizers and activation functions used (e.g., 'Adam W', 'ReLU', 'Batch Normalization') but does not specify versions for core software libraries or programming languages like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | The models in Sec. 4.1, both the generalized and memorized network, were trained for 100 epochs using Adam W (Loshchilov & Hutter, 2017) optimizer with a learning rate (LR) of 1e-3 and a weight decay of 5e-4. The LR is follows cosine LR scheduler (Loshchilov & Hutter, 2016) with an initial LR stated earlier. |