A Theory of Independent Mechanisms for Extrapolation in Generative Models

Authors: Michel Besserve, Remy Sun, Dominik Janzing, Bernhard Schölkopf6741-6749

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate on toy examples that classical stochastic gradient descent can hinder the model s extrapolation capabilities, suggesting independence of mechanisms should be enforced explicitly during training. Experiments on deep generative models trained on real world data support these insights and illustrate how the extrapolation capabilities of such models can be leveraged.
Researcher Affiliation Collaboration Michel Besserve,1,2 R emy Sun,1,3, Dominik Janzing1, and Bernhard Sch olkopf1 1 Max Planck Institute for Intelligent Systems, T ubingen, Germany 2 Max Planck Institute for Biological Cybernetics, T ubingen, Germany 3 ENS Rennes, France {michel.besserve, bs}@tuebingen.mpg.de, janzind@amazon.de, remy.sun@ens-rennes.fr
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Readers can refer to the technical appendix in the extended version of this paper4 for supplemental figures, code resources, symbols and acronyms (Table 1), all proofs (App. A) and method details (App. B). 4https://arxiv.org/abs/2004.00184
Open Datasets Yes Fluorescent hair colors are at least very infrequent in classical face datasets such as Celeb A5, such that classification algorithms trained on these datasets may fail to extract the relevant information from pictures of actual people with such hair, as they are arguably outliers. 5http://mmlab.ie.cuhk.edu.hk/projects/Celeb A.html
Dataset Splits No The paper does not provide specific details on dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments.
Software Dependencies No We used a plain β-VAE11 ((Higgins et al. 2017)) and the official tensorlayer DCGAN implementation12. 11https://github.com/yzwxx/vae-celeb A 12https://github.com/tensorlayer/dcgan. The paper mentions software names but does not provide specific version numbers for these or other dependencies.
Experiment Setup No The paper mentions training iterations (e.g., '10000 iterations', '40000 additional training iterations') but lacks specific details on hyperparameters such as learning rate, batch size, optimizer settings, or other concrete training configurations.