Minimum-Norm Interpolation Under Covariate Shift
Authors: Neil Rohit Mallinar, Austin Zane, Spencer Frei, Bin Yu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We follow our analysis with empirical studies that show these beneficial and malignant covariate shifts for linear interpolators on real image data, and for fully-connected neural networks in settings where the input data dimension is larger than the training sample size. 4. Experiments |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California San Diego, CA, USA 2Department of Statistics, University of California Berkeley, CA, USA 3Department of Statistics, University of California Davis, CA, USA 4Department of Electrical Engineering and Computer Sciences, University of California Berkeley, CA, USA. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about open-sourcing code or links to a code repository. |
| Open Datasets | Yes | Hendrycks & Dietterich (2019) propose the CIFAR-10C dataset as an OOD counterpart to CIFAR-10...In Figures 9 and 11 we use a binary variant of CIFAR-10 and CIFAR-10C. |
| Dataset Splits | No | The paper does not explicitly state validation dataset splits with percentages or counts, or reference a standard validation split. |
| Hardware Specification | Yes | This work used Delta GPU compute nodes at NCSA and HPE and Expanse GPU compute nodes at Dell and SDSC through allocation CIS220009 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296. We train in Py Torch with a single A100 NVIDIA GPU. |
| Software Dependencies | No | We train in Py Torch with a single A100 NVIDIA GPU. |
| Experiment Setup | Yes | We start with a learning rate of 0.01 and decay by a stepped cosine schedule for 1,500 epochs. We take batch size of 64 and train without weight decay. Networks are trained with stochastic gradient descent with a learning rate of 0.1 and stepped cosine decay schedule for 60 epochs. |