reproducibilityindex.ai

On Linear Identifiability of Learned Representations

Authors: Geoffrey Roeder, Luke Metz, Durk Kingma

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments The derivation in Section 3 shows that, for models in the general discriminative family deﬁned in Section 2, the functions fθ and gθ are identiﬁable up to a linear transformation given unbounded data and assuming model convergence. The question remains as to how close a model trained on ﬁnite data and without convergence guarantees will approach this limit. One subtle issue is that poor architecture choices (such as too few hidden units, or inadequate inductive priors) or insufﬁcient data samples when training can interfere with model estimation and thereby linear identiﬁability of the learned representations, due to underﬁtting. In this section, we study this issue over a range of models, from low-dimensional language embedding and supervised classiﬁcation (Figures 1 and 2 respectively) to GPT-2 (Radford et al., 2019), an approximately 1.5 × 109-parameter generative model of natural language (Figure 4).
Researcher Affiliation	Collaboration	Geoffrey Roeder 1 Luke Metz 2 Diederik P. Kingma 2 1Princeton University 2Google Brain. Correspondence to: Geoffrey Roeder <roeder@princeton.edu>, Diederik P. Kingma <durk@google.com>.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Figure 1. ... (see Appendix A.1 for code release and training details).
Open Datasets	Yes	Figure 1. ... Billion Word Dataset (Chelba et al., 2013) ... 5.2. Self-Supervised Learning for Image Classiﬁcation We next investigate high-dimensional, self-supervised representation learning on CIFAR-10 (Krizhevsky et al., 2009) using CPC (Oord et al., 2018; Hénaff et al., 2019).
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or counts) or detailed splitting methodologies for their experiments.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions software like Jax and Hugging Face Transformers and Adam optimizer, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	5.1. Simulation Study: Classiﬁcation by DNNs ... data distribution p D(x, y, S) consists of inputs x sampled from a 2-D Gaussian with σ = 3. The targets y were assigned among K = 18 classes according to their radial position (angle swept out by a ray ﬁxed at the origin). 5.2. Self-Supervised Learning for Image Classiﬁcation ... we deﬁne both fθ and gθ as a 3-layer MLP with 256 units per layer (except where noted otherwise) and ﬁx output dimensionality of 64.