Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Deconfounded Representation Similarity for Comparison of Neural Networks
Authors: Tianyu Cui, Yogesh Kumar, Pekka Marttinen, Samuel Kaski
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that deconfounding the similarity metrics increases the resolution of detecting functionally similar neural networks across domains. Moreover, in real-world applications, deconfounding improves the consistency between CKA and domain similarity in transfer learning, and increases correlation between CKA and model out-of-distribution accuracy similarity. |
| Researcher Affiliation | Academia | Tianyu Cui Department of Computer Science Aalto University EMAIL Yogesh Kumar Department of Computer Science Aalto University EMAIL Pekka Marttinen Department of Computer Science Aalto University EMAIL Samuel Kaski Department of Computer Science Aalto University and University of Manchester EMAIL |
| Pseudocode | No | The paper describes algorithms and methods in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We provide the code in the supplemental material. |
| Open Datasets | Yes | Setup: We check if similarity measures can identify functionally similar NN representations from random NN representations. For each model block of Res Nets (containing 2-3 convolutional layers), we generate two distributions of similarities: the null distribution H0 and the alternative distribution H1. The H0 contains similarities between 50 pairs of random Res Nets on CIFAR-10 test set. ... Distribution H1 contains similarities between the pretrained Image Net NN and each of the 50 Res Nets trained on CIFAR-10 from scratch with different random initializations, on the same CIFAR-10 test as H0. |
| Dataset Splits | Yes | We compute the layer-wise CKA and d CKA between each FT model and the corresponding PT model on the test set of the target domain [22]. ... 2. Evaluate the OOD accuracy of each model on CIFAR-10-C [40], acc(fi), and select the most accurate Res Net as the reference model, f ; 3. Compute the similarity between each fi and f , s(fi, f ), of each block on CIFAR-10 test set (in-distribution similarity)... |
| Hardware Specification | Yes | In experiments, computing d CKA between two XLM-Ro BERTa models [38] takes 0.37 ± 0.11s longer than CKA for each layer on 3000 random English sentences with a single 2080Ti GPU. |
| Software Dependencies | No | The paper mentions using specific models like Res Nets, XLM-RoBERTa, EfficientNet-B0, and Distil-RoBERTa, and refers to PyTorch models in a footnote, but it does not specify exact version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix C. |