Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning

Authors: Sebastien Lachapelle, Tristan Deleu, Divyat Mahajan, Ioannis Mitliagkas, Yoshua Bengio, Simon Lacoste-Julien, Quentin Bertrand

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments We present experiments on disentanglement and few-shot learning.
Researcher Affiliation Academia 1Mila & DIRO, Université de Montréal 2Canada CIFAR AI Chair.
Pseudocode No The paper describes mathematical optimization problems (e.g., Problem (6) and (10)) and algorithmic steps in prose, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our implementation relies on jax and jaxopt (Bradbury et al., 2018; Blondel et al., 2022) and is available here: https://github.com/tristandeleu/ synergies-disentanglement-sparsity.
Open Datasets Yes We validate our theory by showing our approach can indeed disentangle latent factors on tasks constructed from the 3D Shapes dataset (Burgess & Kim, 2018). It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.
Dataset Splits Yes In every task, the dataset has size n = 50. As opposed to the multi-task setting (i.e., unlike in Section 3.1), one is also given separate test datasets (Dtest t )1 t T of n samples for each task t, to evaluate how well the learned model generalizes to new test samples. In meta-learning, the goal is to learn a learning procedure that will generalize well on new unseen tasks.
Hardware Specification No The experiments were in part enabled by computational resources provided by Calcul Quebec and Compute Canada. (This statement is too general and does not provide specific hardware models like CPU, GPU, or memory details).
Software Dependencies No Our implementation relies on jax and jaxopt (Bradbury et al., 2018; Blondel et al., 2022). (The paper mentions software libraries but does not provide specific version numbers for reproducibility).
Experiment Setup Yes We use the four-layer convolutional neural network typically used in the disentanglement literature (Locatello et al., 2019). In inner-Lasso, we set λmax := 1/n ||F^T y||∞ (F Rn m is the design matrix of the features of the samples of a task), while in inner-Ridge we have λmax := 1/n ||F||2. We consider the experimental setting 5-shot 5-way.