reproducibilityindex.ai

A Theory of Independent Mechanisms for Extrapolation in Generative Models

Authors: Michel Besserve, Remy Sun, Dominik Janzing, Bernhard Schölkopf6741-6749

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate on toy examples that classical stochastic gradient descent can hinder the model s extrapolation capabilities, suggesting independence of mechanisms should be enforced explicitly during training. Experiments on deep generative models trained on real world data support these insights and illustrate how the extrapolation capabilities of such models can be leveraged.
Researcher Affiliation	Collaboration	Michel Besserve,1,2 R emy Sun,1,3, Dominik Janzing1, and Bernhard Sch olkopf1 1 Max Planck Institute for Intelligent Systems, T ubingen, Germany 2 Max Planck Institute for Biological Cybernetics, T ubingen, Germany 3 ENS Rennes, France {michel.besserve, bs}@tuebingen.mpg.de, janzind@amazon.de, remy.sun@ens-rennes.fr
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Readers can refer to the technical appendix in the extended version of this paper4 for supplemental ﬁgures, code resources, symbols and acronyms (Table 1), all proofs (App. A) and method details (App. B). 4https://arxiv.org/abs/2004.00184
Open Datasets	Yes	Fluorescent hair colors are at least very infrequent in classical face datasets such as Celeb A5, such that classiﬁcation algorithms trained on these datasets may fail to extract the relevant information from pictures of actual people with such hair, as they are arguably outliers. 5http://mmlab.ie.cuhk.edu.hk/projects/Celeb A.html
Dataset Splits	No	The paper does not provide specific details on dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments.
Software Dependencies	No	We used a plain β-VAE11 ((Higgins et al. 2017)) and the ofﬁcial tensorlayer DCGAN implementation12. 11https://github.com/yzwxx/vae-celeb A 12https://github.com/tensorlayer/dcgan. The paper mentions software names but does not provide specific version numbers for these or other dependencies.
Experiment Setup	No	The paper mentions training iterations (e.g., '10000 iterations', '40000 additional training iterations') but lacks specific details on hyperparameters such as learning rate, batch size, optimizer settings, or other concrete training configurations.