Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning
Authors: Sebastien Lachapelle, Tristan Deleu, Divyat Mahajan, Ioannis Mitliagkas, Yoshua Bengio, Simon Lacoste-Julien, Quentin Bertrand
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments We present experiments on disentanglement and few-shot learning. |
| Researcher Affiliation | Academia | 1Mila & DIRO, Université de Montréal 2Canada CIFAR AI Chair. |
| Pseudocode | No | The paper describes mathematical optimization problems (e.g., Problem (6) and (10)) and algorithmic steps in prose, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our implementation relies on jax and jaxopt (Bradbury et al., 2018; Blondel et al., 2022) and is available here: https://github.com/tristandeleu/ synergies-disentanglement-sparsity. |
| Open Datasets | Yes | We validate our theory by showing our approach can indeed disentangle latent factors on tasks constructed from the 3D Shapes dataset (Burgess & Kim, 2018). It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations. |
| Dataset Splits | Yes | In every task, the dataset has size n = 50. As opposed to the multi-task setting (i.e., unlike in Section 3.1), one is also given separate test datasets (Dtest t )1 t T of n samples for each task t, to evaluate how well the learned model generalizes to new test samples. In meta-learning, the goal is to learn a learning procedure that will generalize well on new unseen tasks. |
| Hardware Specification | No | The experiments were in part enabled by computational resources provided by Calcul Quebec and Compute Canada. (This statement is too general and does not provide specific hardware models like CPU, GPU, or memory details). |
| Software Dependencies | No | Our implementation relies on jax and jaxopt (Bradbury et al., 2018; Blondel et al., 2022). (The paper mentions software libraries but does not provide specific version numbers for reproducibility). |
| Experiment Setup | Yes | We use the four-layer convolutional neural network typically used in the disentanglement literature (Locatello et al., 2019). In inner-Lasso, we set λmax := 1/n ||F^T y||∞ (F Rn m is the design matrix of the features of the samples of a task), while in inner-Ridge we have λmax := 1/n ||F||2. We consider the experimental setting 5-shot 5-way. |