reproducibilityindex.ai

Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

Authors: Muthu Chidambaram, Xiang Wang, Chenwei Wu, Rong Ge

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show in Section 5 that our theory extends to practice by training models on image classification benchmarks that are modified to have additional spurious features correlated with the true class labels. We find in our experiments that Midpoint Mixup outperforms ERM, and performs comparably to the previously used Mixup settings in Zhang et al. (2018). 5. Experiments
Researcher Affiliation	Academia	Muthu Chidambaram 1 Xiang Wang 1 Chenwei Wu 1 Rong Ge 1 1Department of Computer Science, Duke University. Correspondence to: Muthu Chidambaram <muthu@cs.duke.edu>.
Pseudocode	No	We denote our network by g : RP d Rk. For each y [k], we define gy as follows. p [P ] Re LU D wy,r, x(p)E (4.1) We will use w(0) y,r to refer to the weights of the network g at initialization (and w(t) y,r after t steps of gradient descent), and similarly gt to refer to the model after t iterations of gradient descent. We consider the standard choice of Xavier initialization, which, in our setting, corresponds to w(0) y,r N(0, 1 For model training, we focus on full batch gradient descent with a fixed learning rate of η applied to J(g, X) and JMM(g, X). Once again using the notation w(t) y,r for w(t) y,r , the updates to the weights of the network g are thus of the form: w(t+1) y,r = w(t) y,r η w(t) y,r JMM(g, X) (4.2)
Open Source Code	Yes	Code for our experiments is available at: https://github.com/2014mchidamb/ midpoint-mixup-multi-view-icml.
Open Datasets	Yes	For our experimental setup, we consider training ResNet-18 (He et al., 2015) on versions of Fashion MNIST (FMNIST) (Xiao et al., 2017), CIFAR-10, and CIFAR-100 (Krizhevsky, 2009)
Dataset Splits	No	All models were trained for 100 epochs with a batch size of 750, which was the largest feasible size on our compute setup of a single P100 GPU (we use a large batch size to approximate the full batch gradient descent aspect of our theory).
Hardware Specification	Yes	All models were trained for 100 epochs with a batch size of 750, which was the largest feasible size on our compute setup of a single P100 GPU (we use a large batch size to approximate the full batch gradient descent aspect of our theory).
Software Dependencies	No	Our implementation is in Py Torch (Paszke et al., 2019) and uses the ResNet implementation of Kuang Liu, released under an MIT license. All models were trained for 100 epochs with a batch size of 750... For optimization, we use Adam (Kingma & Ba, 2015) with the default hyperparameters of β1 = 0.9, β2 = 0.999 and a learning rate of 0.001.
Experiment Setup	Yes	All models were trained for 100 epochs with a batch size of 750... For optimization, we use Adam (Kingma & Ba, 2015) with the default hyperparameters of β1 = 0.9, β2 = 0.999 and a learning rate of 0.001.