Diffused Redundancy in Pre-trained Representations
Authors: Vedant Nanda, Till Speicher, John Dickerson, Krishna Gummadi, Soheil Feizi, Adrian Weller
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on different neural architectures (including CNNs and Transformers) pretrained on both Image Net1k and Image Net21k and evaluate a variety of downstream tasks taken from the VTAB benchmark. |
| Researcher Affiliation | Academia | Vedant Nanda University of Maryland & MPI-SWS Till Speicher MPI-SWS John P. Dickerson University of Maryland Krishna P. Gummadi MPI-SWS Soheil Feizi University of Maryland Adrian Weller The Alan Turing Institute & University of Cambridge |
| Pseudocode | No | The paper describes methods and calculations in paragraph form and through equations (e.g., Eq 1), but it does not contain explicit 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/nvedant07/diffused-redundancy. |
| Open Datasets | Yes | We conduct experiments on different neural architectures (including CNNs and Transformers) pretrained on both Image Net1k and Image Net21k and evaluate a variety of downstream tasks taken from the VTAB benchmark. We evaluate on CIFAR10/100 [25], Oxford-IIIT-Pets [43] and Flowers [38] datasets, from the VTAB benchmark [68]. Additionally, we also report performance on harder datasets such as Image Netv2 [44] and Places365 [69]. |
| Dataset Splits | No | The paper mentions training parameters for linear probes and datasets used, but does not explicitly detail the train/validation/test splits (e.g., percentages or exact counts) or how data was partitioned for validation purposes. It references datasets like CIFAR10/100, Oxford-IIIT-Pets, and Flowers, which have standard splits, but these are not explicitly stated in the paper's text itself. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions software like 'timm' for implementation of architectures [59] and 'SGD' as an optimizer, but it does not specify version numbers for any key software components or libraries. |
| Experiment Setup | Yes | All linear probes trained on the representations of these models are trained using SGD with a learning rate of 0.1, momentum of 0.9, batch size of 256, weight decay of 1e-4. The probe is trained for 50 epochs with a learning rate scheduler that decays the learning rate by 0.1 every 10 epochs. For pre-processing, we re-size all inputs to 224x224 (size used for pre-training) and apply the usual composition of Random Horizontal Flip, Color Jitter... All inputs were mean normalized. For imagenet1k pre-trained models: mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. For imagenet21k pre-trained models: mean = [0.5,0.5,0.5], std = [0.5,0.5,0.5]. |