Robust Representation Learning via Perceptual Similarity Metrics

Authors: Saeid A Taghanaki, Kristy Choi, Amir Hosein Khasahmadi, Anirudh Goyal

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate the efficacy of our approach on tasks which typically suffer from the presence of spurious correlations: classification with nuisance information, out-of-distribution generalization, and preservation of subgroup accuracies. We additionally show that CIM is complementary to other mutual information-based representation learning techniques, and demonstrate that it improves the performance of variational information bottleneck (VIB) when used together. Empirically, we evaluate our method on five different datasets under three settings that suffer from spurious correlations: classification with nuisance background information, out-of-domain (OOD) generalization, and improving accuracy uniformly across subgroups. In the first task, we show that when CIM is used with VIB (CIM+VIB), it outperforms ERM on colored MNIST and improves over the Res Net-50 baseline on the Background Challenge (Xiao et al., 2020).
Researcher Affiliation Collaboration 1Autodesk AI Lab 2Computer Science, Stanford University 3Mila, Universit e de Montr eal.
Pseudocode No The paper describes procedures and uses a flowchart (Figure 1) but does not present any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about the release of its source code or a link to a code repository for the described methodology.
Open Datasets Yes Datasets: We consider various datasets and tasks to test the effectiveness of our method. We first construct a colored variant of MNIST (Le Cun, 1998) to demonstrate that CIM successfully ignores nuisance background information in a digit classification task, then further explore this finding on the Background Challenge (Xiao et al., 2020) a more challenging dataset. Next, we evaluate CIM on the VLCS dataset (Torralba & Efros, 2011) to demonstrate that the input transformations help in learning representations that generalize to OOD distributions. Then, we study two benchmark datasets, Celeb A (Liu et al., 2015) and Waterbirds (Wah et al., 2011; Zhou et al., 2017), to show that CIM preserves subgroup accuracies.
Dataset Splits No The paper states 'we partition each domain into a train (70%) and test set (30%)' but does not explicitly mention a separate validation split or its percentage.
Hardware Specification No The paper mentions the use of different model architectures (ResNet-50, ResNet-18, MLP) but does not specify the hardware (e.g., GPU models, CPU types, memory) used for the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The TN, which is parameterized by φ, takes in an image x RH W C and produces a weight matrix m RH W 1 normalized by the sigmoid activation function, where H W denotes the height and width of the image, and C denotes the number of channels. We then use this weight matrix m to transform the input samples by composing it with the learned mask via element-wise multiplication, which gives us the final transformed image ψ(x) = m x. The classifier fθ( ) is trained via the usual cross entropy loss on ψ(x). For the Colored MNIST experiment, we use a simple 3-layered multi-layer perceptron (MLP). The three fully-connected layers are of size 1024, 512, and 256 with Re LU activations. The parameters for the transformation network (φ) and the classifier (θ) are trained jointly. LCIM(φ, θ) = λLcon(φ) + Lsup(θ) where λ > 0 is a hyperparameter which controls the contribution of the triplet loss from Eq. 8.