Minimum-Norm Interpolation Under Covariate Shift

Authors: Neil Rohit Mallinar, Austin Zane, Spencer Frei, Bin Yu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We follow our analysis with empirical studies that show these beneficial and malignant covariate shifts for linear interpolators on real image data, and for fully-connected neural networks in settings where the input data dimension is larger than the training sample size. 4. Experiments
Researcher Affiliation Academia 1Department of Computer Science, University of California San Diego, CA, USA 2Department of Statistics, University of California Berkeley, CA, USA 3Department of Statistics, University of California Davis, CA, USA 4Department of Electrical Engineering and Computer Sciences, University of California Berkeley, CA, USA.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about open-sourcing code or links to a code repository.
Open Datasets Yes Hendrycks & Dietterich (2019) propose the CIFAR-10C dataset as an OOD counterpart to CIFAR-10...In Figures 9 and 11 we use a binary variant of CIFAR-10 and CIFAR-10C.
Dataset Splits No The paper does not explicitly state validation dataset splits with percentages or counts, or reference a standard validation split.
Hardware Specification Yes This work used Delta GPU compute nodes at NCSA and HPE and Expanse GPU compute nodes at Dell and SDSC through allocation CIS220009 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296. We train in Py Torch with a single A100 NVIDIA GPU.
Software Dependencies No We train in Py Torch with a single A100 NVIDIA GPU.
Experiment Setup Yes We start with a learning rate of 0.01 and decay by a stepped cosine schedule for 1,500 epochs. We take batch size of 64 and train without weight decay. Networks are trained with stochastic gradient descent with a learning rate of 0.1 and stepped cosine decay schedule for 60 epochs.