Leveraging sparse and shared feature activations for disentangled representation learning
Authors: Marco Fumero, Florian Wenzel, Luca Zancato, Alessandro Achille, Emanuele Rodolà, Stefano Soatto, Bernhard Schölkopf, Francesco Locatello
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach on six real world distribution shift benchmarks, and different data modalities (images, text), demonstrating how disentangled representations can be transferred to real settings. We demonstrate that is possible to learn disentangled representations leveraging knowledge from a distribution of tasks. For this, we propose a meta learning approach to learn a feature space from a collection of tasks while incorporating our sparse sufficiency and minimality principles favoring task specific features to coexist with general features. Following previous literature, we test our approach on synthetic data, validating in an idealized controlled setting that our sufficiency and minimality principles lead to disentangled features w.r.t. the ground truth factors of variation, as expected from our identifiability result in Proposition 2.1. We extend our empirical evaluation to non-synthetic data where factors of variations are not known, and show that our approach generalizes well out-of-distribution on different domain generalization and distribution shift benchmarks. |
| Researcher Affiliation | Collaboration | Marco Fumero Sapienza, University of Rome Florian Wenzel Amazon AWS Luca Zancato Amazon AWS Alessandro Achille Amazon AWS Emanuele Rodolà Sapienza, University of Rome Stefano Soatto Amazon AWS Bernhard Schölkopf Amazon AWS Francesco Locatello IST Austria |
| Pseudocode | Yes | The algorithm is summarized in section B.1 of the Appendix. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | For the DSprites dataset an example of valid task is There is a big object on the left of the image . In this case, the partially observed factors (quantized to only two values) are the x position and size. ... We evaluate our method on domain generalization and domain shift tasks on six different benchmarks (Section 4.2). In a domain generalization setting, we do not have access to samples coming from the testing domain, which is considered to be OOD w.r.t. to the training domains. ... We test on two benchmarks Waterbirds [73], and Civil Comments [44] (see Appendix C.1). ... We evaluate the domain generalization performance on the PACS, VLCS and Office Home datasets from the Domain Bed [32] test suite (see Appendix C.1 for more details). ... Camelyon17. The model is trained according to the original splits in the dataset. [6] |
| Dataset Splits | Yes | Hyperparameters α and β are tuned performing model selection on validation set, unless specified otherwise. For these experiments we use a Res Net50 pretrained on Imagenet [17] as a backbone, as done in [32] To fit the linear head we sample 10 times with different samples sizes from the training domains and we report the mean score and standard deviation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The optimal classifiers fϕ are given by the inner optimization problem: t Linner(ˆy U t , y U t ) + Reg(ϕ), (5) where ˆy U t = fϕ(gθ(x U t ). For both the inner loss Linner and outer loss Louter we use the cross entropy loss. ... In practice we solve the bi-level optimization problem (4) and (5) as follows. In each iteration we sample a batch of T tasks with the associated support and query set as described above. First, we use the samples from the support set St to fit the linear heads fϕ by solving the inner optimization problem (5) using stochastic gradient descent for a fixed number of steps. ... Hyperparameters α and β are tuned performing model selection on validation set, unless specified otherwise. |