An Empirical Bayes Approach to Optimizing Machine Learning Algorithms
Authors: James McInerney
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The resulting approach, empirical Bayes for hyperparameter averaging (EB-Hyp) predicts held-out data better than Bayesian optimization in two experiments on latent Dirichlet allocation and deep latent Gaussian models. |
| Researcher Affiliation | Industry | James Mc Inerney Spotify Research 45 W 18th St, 7th Floor New York, NY 10011 jamesm@spotify.com |
| Pseudocode | Yes | Algorithm 1: Empirical Bayes for hyperparameter averaging (EB-Hyp) |
| Open Source Code | No | No explicit statement regarding the release of source code for the described methodology or a direct link to a code repository is provided. |
| Open Datasets | Yes | In the first experiment, we consider stochastic variational inference on latent Dirichlet allocation (SVI-LDA) applied to the 20 Newsgroups data.3 In the second, a deep latent Gaussian model (DLGM) on the Labeled Faces in the Wild data set (Huang et al., 2007). |
| Dataset Splits | Yes | Throughout, we randomly split the data into training, validation, and test sets. [...] The 11,314 resulting documents were randomly split 80%-10%-10% into training, validation, and test sets. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running the experiments are provided. |
| Software Dependencies | No | The paper mentions using 'GPy package (GPy, 2012)' but does not specify a version number for this or any other software dependency, which is required for reproducibility. |
| Experiment Setup | Yes | We explored four hyperparameters of SVI-LDA in the experiments: K [50, 200], the number of topics; log(α) [ 5, 0], the hyperparameter to the Dirichlet document-topic prior; log(η) [ 5, 0], the hyperparameter to the Dirichlet topic-word distribution prior; κ [0.5, 0.9], the decay parameter to the learning rate (t0 + t) κ, where t0 was fixed at 10 for this experiment. Several other hyperparameters are required and were kept fixed during the experiment. The minibatch size was fixed at 100 documents and the vocabulary was selected from the top 1,000 words... |