Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models
Authors: Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an empirical study of different aspects of overparameteriza tion in unsupervised learning of latent variable models via synthetic and semi-synthetic experi ments. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2Google 3Harvard Uni versity 4Carnegie Mellon University. |
| Pseudocode | No | The paper describes training algorithms in prose but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | We learn the model from the UCI plants data set (Lichman et al., 2013)... We again evaluate using synthetic data sets... We first learn a neural PCFG with 10 nonterminals and 10 preterminals (i.e. |N | = |P| = 10) on the Penn Treebank (Marcus et al., 1993). |
| Dataset Splits | Yes | We split these samples into a training set of 9, 000 samples, a validation set of 1, 000 samples, and a test set of 1, 000 samples. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud instance specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or specific solver versions). |
| Experiment Setup | Yes | For the input-dependent signal centering, we use a two-layer neural network with 100 hidden nodes in the second layer and tanh activation functions. For all data sets, we test the recognition network algorithm using 8 latent variables (i.e. no overparameterization), 16, 32, 64, and 128. For each experiment configuration, we run the algorithm 500 times with different random initializations of the generative model parameters. We also test the algorithm using batch size 1000 instead of 20. |