Nonlinear Statistical Learning with Truncated Gaussian Graphical Models

Authors: Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use the TGGM to design models for nonlinear regression and classification, with the performances of these models demonstrated on extensive benchmark datasets and compared to state-of-the-art competing results. 5. Experiments We report the performance of the proposed TGGM models on publicly available data sets, in comparison to competing models.
Researcher Affiliation Academia Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin QS15, XJLIAO, CC448, LCARIN@DUKE.EDU Duke University, Durham, NC 27519, USA
Pseudocode Yes Algorithm 1 ML Estimator for TGGM Regression
Open Source Code No The paper does not provide any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets Yes We report the performance of the proposed TGGM models on publicly available data sets, in comparison to competing models. [...] Regression The root mean square error (RMSE), averaged over multiple trials of splitting each data set into training and testing subsets, is used as a performance measure to evaluate the TGGM against the Re LU neural network. [...] The comparison is based on the same data and same training/testing protocols in (Hern andez-Lobato & Adams, 2015), by using a consistent setting for the TGGM as follows: a single hidden layer is used in the TGGM for all data sets, with 100 hidden nodes used for Protein Structure and Year Prediction MSD, the two largest data sets, and 50 hidden nodes used for the other data sets. [...] Classification Three public benchmark data sets are considered for this task: MNIST, 20 News Groups, and Blog.
Dataset Splits Yes The root mean square error (RMSE), averaged over multiple trials of splitting each data set into training and testing subsets, is used as a performance measure to evaluate the TGGM against the Re LU neural network. The comparison is based on the same data and same training/testing protocols in (Hern andez-Lobato & Adams, 2015). [...] The step-size for gradient ascent is chosen from [10 4, 5 10 3] by maximizing the accuracy on a cross-validation set.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU/CPU models, memory, or processor types.
Software Dependencies No The paper mentions software like 'RMSProp' and 'Caffe' but does not provide specific version numbers for these or any other ancillary software components used in the experiments.
Experiment Setup Yes In all experiments below, RMSProp (Tieleman & Hinton, 2012) is applied to update the model parameters by using the current estimated gradients, with RMSprop delay set to be 0.95. [...] For both training methods, Θ is initialized as Gaussian random numbers, with each component a random draw from N(0, 0.01). To speed up, each gradient update uses a mini-batch of training samples, resulting in stochastic gradient search. The batch size is 100 for the two largest data sets and 50 for the others. For ML estimation, the number of cycles used by mean-field VB is set to 10, and σ2 1 = σ2 0 = 0.5. [...] The models are randomly initialized with Gaussian random numbers drawn from N(0, 0.01). The step-size for gradient ascent is chosen from [10 4, 5 10 3] by maximizing the accuracy on a cross-validation set. The TGGMs use a minibatch size of 500 for MNIST and 200 for the other two data sets, while the Re LU uses 100 for all data sets. Variance parameters {σ2 0, σ2 1} are set to 0.5 for TGGM-ML and 0.01 for TGGM-BP, in both one and two-layer models. When ML estimation is applied, the number of VB cycles is initially set to 30 and then gradually decreases to 5.