reproducibilityindex.ai

Overcoming Simplicity Bias in Deep Networks using a Feature Sieve

Authors: Rishabh Tiwari, Pradeep Shenoy

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide concrete evidence of this differential suppression & enhancement of relevant features on both controlled datasets and real-world images, and report substantial gains on many real-world debiasing benchmarks (11.4% relative gain on Imagenet-A; 3.2% on BAR, etc).
Researcher Affiliation	Industry	Rishabh Tiwari 1 Pradeep Shenoy 1 1Google Research India. Correspondence to: Pradeep Shenoy <shenoypradeep@google.com>.
Pseudocode	Yes	Algorithm 1 SIFER: Mitigating simplicity bias Input :Pretrained Model Weights W; training data D; training iters N Hparams :Aux Depth AD; Aux Position AP main lr weight α1; aux lr weight α2 aux forget weight α3; forget after iters F Output :robust model weights W for k = 1 . . . N do (x, y) sample(D) ˆy, ˆyaux Forward with aux(x, AD, AP , W) L1 CE(ˆy, y) L2 CE(ˆyaux, y) Lf CE(ˆyaux, U) L α1L1 + α2L2 if k % F == 0 then L L + α3Lf end W Backward(L) W Optimize Step( W) end
Open Source Code	Yes	Code available at https://github.com/google-research/googleresearch/sifer
Open Datasets	Yes	CMNIST: Colored-MNIST... CIFAR-MNIST... BAR: Biased Activity Recognition (Nam et al., 2020)... Celeb A: Celeb A (Liu et al., 2018)... NICO: NICO (He et al., 2021)... Image Net-9: Image Net-9 (Xiao et al., 2020)... Image Net-A: Image Net-A (Hendrycks et al., 2021);
Dataset Splits	Yes	For BAR, since there is no validation data provided, we study it under two settings. In the ﬁrst, we use 20% of images from the test set and call it OOD-validation. In the second setting, which is harder and more realistic, we use 20% images from the train set, calling it In-Domain (ID) validation. For NICO-Animal, Celeb A Hair and Image Net-9, we use the already supplied validation data. Table 1 shows the percentage of bias-conﬂicting examples, i.e., examples that violate the spurious feature correlation or training domain, for each portion of each dataset.
Hardware Specification	No	The paper does not explicitly describe the hardware used for experiments, such as specific GPU or CPU models, memory amounts, or cloud computing instance types.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers, such as programming language versions, library versions, or specific deep learning frameworks (e.g., 'optimized using SGD optimizer' does not include version information).
Experiment Setup	Yes	For all our real-world experiments we consistently used Res Net-18, an auxiliary layer that uses the same layer structure as of Basic Block of Res Net with varying depth, optimized using SGD optimizer with a ﬁxed learning rate of 0.001. Table 7 shows the hyperparamters search space for all the hyperparameters that we tune on the basis of validation set. Table 8 shows the hyperparameter values obtained from the hyperparamter tuning.