Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models

Authors: Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform an empirical study of different aspects of overparameteriza tion in unsupervised learning of latent variable models via synthetic and semi-synthetic experi ments.
Researcher Affiliation Collaboration 1Massachusetts Institute of Technology 2Google 3Harvard Uni versity 4Carnegie Mellon University.
Pseudocode No The paper describes training algorithms in prose but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets Yes We learn the model from the UCI plants data set (Lichman et al., 2013)... We again evaluate using synthetic data sets... We first learn a neural PCFG with 10 nonterminals and 10 preterminals (i.e. |N | = |P| = 10) on the Penn Treebank (Marcus et al., 1993).
Dataset Splits Yes We split these samples into a training set of 9, 000 samples, a validation set of 1, 000 samples, and a test set of 1, 000 samples.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud instance specifications) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or specific solver versions).
Experiment Setup Yes For the input-dependent signal centering, we use a two-layer neural network with 100 hidden nodes in the second layer and tanh activation functions. For all data sets, we test the recognition network algorithm using 8 latent variables (i.e. no overparameterization), 16, 32, 64, and 128. For each experiment configuration, we run the algorithm 500 times with different random initializations of the generative model parameters. We also test the algorithm using batch size 1000 instead of 20.