reproducibilityindex.ai

Diffused Redundancy in Pre-trained Representations

Authors: Vedant Nanda, Till Speicher, John Dickerson, Krishna Gummadi, Soheil Feizi, Adrian Weller

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on different neural architectures (including CNNs and Transformers) pretrained on both Image Net1k and Image Net21k and evaluate a variety of downstream tasks taken from the VTAB benchmark.
Researcher Affiliation	Academia	Vedant Nanda University of Maryland & MPI-SWS Till Speicher MPI-SWS John P. Dickerson University of Maryland Krishna P. Gummadi MPI-SWS Soheil Feizi University of Maryland Adrian Weller The Alan Turing Institute & University of Cambridge
Pseudocode	No	The paper describes methods and calculations in paragraph form and through equations (e.g., Eq 1), but it does not contain explicit 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available at https://github.com/nvedant07/diffused-redundancy.
Open Datasets	Yes	We conduct experiments on different neural architectures (including CNNs and Transformers) pretrained on both Image Net1k and Image Net21k and evaluate a variety of downstream tasks taken from the VTAB benchmark. We evaluate on CIFAR10/100 [25], Oxford-IIIT-Pets [43] and Flowers [38] datasets, from the VTAB benchmark [68]. Additionally, we also report performance on harder datasets such as Image Netv2 [44] and Places365 [69].
Dataset Splits	No	The paper mentions training parameters for linear probes and datasets used, but does not explicitly detail the train/validation/test splits (e.g., percentages or exact counts) or how data was partitioned for validation purposes. It references datasets like CIFAR10/100, Oxford-IIIT-Pets, and Flowers, which have standard splits, but these are not explicitly stated in the paper's text itself.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies	No	The paper mentions software like 'timm' for implementation of architectures [59] and 'SGD' as an optimizer, but it does not specify version numbers for any key software components or libraries.
Experiment Setup	Yes	All linear probes trained on the representations of these models are trained using SGD with a learning rate of 0.1, momentum of 0.9, batch size of 256, weight decay of 1e-4. The probe is trained for 50 epochs with a learning rate scheduler that decays the learning rate by 0.1 every 10 epochs. For pre-processing, we re-size all inputs to 224x224 (size used for pre-training) and apply the usual composition of Random Horizontal Flip, Color Jitter... All inputs were mean normalized. For imagenet1k pre-trained models: mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. For imagenet21k pre-trained models: mean = [0.5,0.5,0.5], std = [0.5,0.5,0.5].