reproducibilityindex.ai

Revisiting Model Stitching to Compare Neural Representations

Authors: Yamini Bansal, Preetum Nakkiran, Boaz Barak

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we use model stitching to obtain quantitative veriﬁcations for intuitive statements such as good networks learn similar representations , by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. self-supervised learning), can be stitched to each other without drop in performance.
Researcher Affiliation	Academia	Yamini Bansal Harvard University ybansal@g.harvard.edu Preetum Nakkiran Harvard University preetum@cs.harvard.edu Boaz Barak Harvard University b@boazbarak.org
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link to its own open-source code for the methodology described. It mentions a third-party framework (VISSL), but not the authors' own implementation code.
Open Datasets	Yes	Unless speciﬁed otherwise, the CIFAR-10 experiments are conducted on the Res Net-18 architecture (with ﬁrst layer width 64) and the Image Net experiments are conducted on the Res Net-50 architecture [He et al., 2015]. ...a Vision Transformer [Dosovitskiy et al., 2020] pretrained on CIFAR-5m [Nakkiran et al., 2021].
Dataset Splits	No	While standard datasets (CIFAR-10, ImageNet) are mentioned, the paper does not explicitly provide specific details on dataset split percentages, sample counts for splits, or clear citations to predefined splits for reproducibility. It mentions a "train set" and "test set" but not the methodology for the split or any explicit validation set usage for hyperparameter tuning.
Hardware Specification	No	The paper mentions "Satori compute cluster" but does not provide specific hardware details such as GPU or CPU models, or memory specifications used for the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	The stitching layer in the the convolutional networks consist of a 1 1 convolutional layer with input features equal to the number of channels in r, and output features equal to the output channels of A l(x). We add a Batch Norm (BN) layer before and after this convolutional layer. Note that the BN layer does not change the representation capacity of the stitching layer and only aides with optimization. We use the Adam optimizer with the cosine learning rate decay and an initial learning rate of 0.001.