Nonlinear Statistical Learning with Truncated Gaussian Graphical Models
Authors: Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use the TGGM to design models for nonlinear regression and classification, with the performances of these models demonstrated on extensive benchmark datasets and compared to state-of-the-art competing results. 5. Experiments We report the performance of the proposed TGGM models on publicly available data sets, in comparison to competing models. |
| Researcher Affiliation | Academia | Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin QS15, XJLIAO, CC448, LCARIN@DUKE.EDU Duke University, Durham, NC 27519, USA |
| Pseudocode | Yes | Algorithm 1 ML Estimator for TGGM Regression |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | We report the performance of the proposed TGGM models on publicly available data sets, in comparison to competing models. [...] Regression The root mean square error (RMSE), averaged over multiple trials of splitting each data set into training and testing subsets, is used as a performance measure to evaluate the TGGM against the Re LU neural network. [...] The comparison is based on the same data and same training/testing protocols in (Hern andez-Lobato & Adams, 2015), by using a consistent setting for the TGGM as follows: a single hidden layer is used in the TGGM for all data sets, with 100 hidden nodes used for Protein Structure and Year Prediction MSD, the two largest data sets, and 50 hidden nodes used for the other data sets. [...] Classification Three public benchmark data sets are considered for this task: MNIST, 20 News Groups, and Blog. |
| Dataset Splits | Yes | The root mean square error (RMSE), averaged over multiple trials of splitting each data set into training and testing subsets, is used as a performance measure to evaluate the TGGM against the Re LU neural network. The comparison is based on the same data and same training/testing protocols in (Hern andez-Lobato & Adams, 2015). [...] The step-size for gradient ascent is chosen from [10 4, 5 10 3] by maximizing the accuracy on a cross-validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU/CPU models, memory, or processor types. |
| Software Dependencies | No | The paper mentions software like 'RMSProp' and 'Caffe' but does not provide specific version numbers for these or any other ancillary software components used in the experiments. |
| Experiment Setup | Yes | In all experiments below, RMSProp (Tieleman & Hinton, 2012) is applied to update the model parameters by using the current estimated gradients, with RMSprop delay set to be 0.95. [...] For both training methods, Θ is initialized as Gaussian random numbers, with each component a random draw from N(0, 0.01). To speed up, each gradient update uses a mini-batch of training samples, resulting in stochastic gradient search. The batch size is 100 for the two largest data sets and 50 for the others. For ML estimation, the number of cycles used by mean-field VB is set to 10, and σ2 1 = σ2 0 = 0.5. [...] The models are randomly initialized with Gaussian random numbers drawn from N(0, 0.01). The step-size for gradient ascent is chosen from [10 4, 5 10 3] by maximizing the accuracy on a cross-validation set. The TGGMs use a minibatch size of 500 for MNIST and 200 for the other two data sets, while the Re LU uses 100 for all data sets. Variance parameters {σ2 0, σ2 1} are set to 0.5 for TGGM-ML and 0.01 for TGGM-BP, in both one and two-layer models. When ML estimation is applied, the number of VB cycles is initially set to 30 and then gradually decreases to 5. |