Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distilling Representations from GAN Generator via Squeeze and Span

Authors: Yu Yang, Xiaotian Cheng, Chang Liu, Hakan Bilen, Xiangyang Ji

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments justify the efﬁcacy of our method and reveal its great signiﬁcance in self-supervised representation learning. Code is available at https://github.com/yangyu12/squeeze-and-span.
Researcher Affiliation	Academia	Yu Yang1 , Xiaotian Cheng1 , Chang Liu1, Hakan Bilen2, Xiangyang Ji1 1Tsinghua University, BNRist 2University of Ediburgh
Pseudocode	No	The paper includes diagrams to illustrate concepts (e.g., Figure 1, Figure 2, Figure 3) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/yangyu12/squeeze-and-span.
Open Datasets	Yes	Our methods are mainly evaluated on CIFAR10, CIFAR100, and STL10, Image Net100, and Image Net. CIFAR10 and CIFAR100 [39] are two image datasets containing small images at 32 32 resolution with 10 and 100 classes, respectively, and both split into 50,000 images for training and 10,000 for validation. STL-10 [13], which is derived from the Image Net [14], includes images at 96 96 resolution over 10 classes. STL-10 contains 500 labeled images per class (i.e. 5K in total) with an additional 100K unlabeled images for training and 800 labeled images for testing. Image Net100 [55] contains images of 100 classes, among which 126,689 images are regarded as the train split and 5,000 images are taken as the validation split. Image Net [14] is a popular large-scale image dataset of 1000 classes, which is split into 1,281,167 images as training set and 50,000 images as validation set.
Dataset Splits	Yes	CIFAR10 and CIFAR100 [39] are two image datasets containing small images at 32 32 resolution with 10 and 100 classes, respectively, and both split into 50,000 images for training and 10,000 for validation... Image Net100 [55] contains images of 100 classes, among which 126,689 images are regarded as the train split and 5,000 images are taken as the validation split. Image Net [14] is a popular large-scale image dataset of 1000 classes, which is split into 1,281,167 images as training set and 50,000 images as validation set.
Hardware Specification	No	The paper mentions that ‘More details can be referred in the supplementary material’ and ‘More details are available in the supplementary material’ regarding implementation, but it does not specify any particular hardware components (e.g., CPU, GPU models) used for experiments in the main body.
Software Dependencies	No	The paper mentions software components like ‘Res Net18’, ‘Res Net50’, ‘SGD optimizer’, and ‘Style GAN2-ADA’ but does not specify their version numbers or the versions of underlying libraries like PyTorch or TensorFlow.
Experiment Setup	Yes	The squeeze module uses linear layers to transform the generator features into vectors with 2048 dimensions, which are then summed up and fed into a three-layer MLP to get a 2048-d teacher representation. On CIFAR10 and CIFAR100, we use Res Net18 [30] of the CIFAR variant as the backbone. On STL10, we use Res Net18 as the backbone. On Image Net100 and Image Net, we use Res Net50 as the backbone. On top of the backbone network, a ﬁve-layer MLP is added for producing representation. We use SGD optimizer with cosine learning rate decay [43] scheduler to optimize our models. The actual learning rate is linearly scaled according to the ratio of batch size to 256, i.e. base_lr batch_size/256 [24]. The overall loss is computed by simply combine the generated data loss and real data loss as Ltotal = αLsqueeze + (1 α)Lspan, where α = 0.5 denotes the proportion of synthetic data in a mini-batch of training samples.