On Linear Identifiability of Learned Representations

Authors: Geoffrey Roeder, Luke Metz, Durk Kingma

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments The derivation in Section 3 shows that, for models in the general discriminative family defined in Section 2, the functions fθ and gθ are identifiable up to a linear transformation given unbounded data and assuming model convergence. The question remains as to how close a model trained on finite data and without convergence guarantees will approach this limit. One subtle issue is that poor architecture choices (such as too few hidden units, or inadequate inductive priors) or insufficient data samples when training can interfere with model estimation and thereby linear identifiability of the learned representations, due to underfitting. In this section, we study this issue over a range of models, from low-dimensional language embedding and supervised classification (Figures 1 and 2 respectively) to GPT-2 (Radford et al., 2019), an approximately 1.5 × 109-parameter generative model of natural language (Figure 4).
Researcher Affiliation Collaboration Geoffrey Roeder 1 Luke Metz 2 Diederik P. Kingma 2 1Princeton University 2Google Brain. Correspondence to: Geoffrey Roeder <roeder@princeton.edu>, Diederik P. Kingma <durk@google.com>.
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Figure 1. ... (see Appendix A.1 for code release and training details).
Open Datasets Yes Figure 1. ... Billion Word Dataset (Chelba et al., 2013) ... 5.2. Self-Supervised Learning for Image Classification We next investigate high-dimensional, self-supervised representation learning on CIFAR-10 (Krizhevsky et al., 2009) using CPC (Oord et al., 2018; Hénaff et al., 2019).
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or counts) or detailed splitting methodologies for their experiments.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions software like Jax and Hugging Face Transformers and Adam optimizer, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes 5.1. Simulation Study: Classification by DNNs ... data distribution p D(x, y, S) consists of inputs x sampled from a 2-D Gaussian with σ = 3. The targets y were assigned among K = 18 classes according to their radial position (angle swept out by a ray fixed at the origin). 5.2. Self-Supervised Learning for Image Classification ... we define both fθ and gθ as a 3-layer MLP with 256 units per layer (except where noted otherwise) and fix output dimensionality of 64.