Max-Margin Deep Generative Models
Authors: Chongxuan Li, Jun Zhu, Tianlin Shi, Bo Zhang
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on MNIST and SVHN datasets demonstrate that (1) maxmargin learning can significantly improve the prediction performance of DGMs and meanwhile retain the generative ability; and (2) mm DGMs are competitive to the state-of-the-art fully discriminative networks by employing deep convolutional neural networks (CNNs) as both recognition and generative models. |
| Researcher Affiliation | Academia | Dept. of Comp. Sci. & Tech., State Key Lab of Intell. Tech. & Sys., TNList Lab, Center for Bio-Inspired Computing Research, Tsinghua University, Beijing, 100084, China Dept. of Comp. Sci., Stanford University, Stanford, CA 94305, USA {licx14@mails., dcszj@, dcszb@}tsinghua.edu.cn; stl501@gmail.com |
| Pseudocode | Yes | Algorithm 1 Doubly Stochastic Subgradient Algorithm |
| Open Source Code | Yes | 1The source code is available at https://github.com/zhenxuan00/mmdgm. |
| Open Datasets | Yes | We now present experimental results on the widely adopted MNIST [14] and SVHN [22] datasets. |
| Dataset Splits | Yes | MNIST... which consists of images of 10 different classes (0 to 9) of size 28 28 with 50,000 training samples, 10,000 validating samples and 10,000 testing samples. SVHN [22] is a large dataset consisting of color images of size 32 32. The task is to recognize center digits in natural scene images, which is significantly harder than classification of hand-written digits. We follow the work [27, 8] to split the dataset into 598,388 training data, 6000 validating data and 26, 032 testing data and preprocess the data by Local Contrast Normalization (LCN). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running the experiments. |
| Software Dependencies | No | We implement all experiments based on Theano [2]. This mentions a software component but does not specify its version number, nor does it list multiple software components with versions. |
| Experiment Setup | Yes | We choose C = 15 for MMVA... In the CNN case, we use 60,000 training data. Table 2 shows the effect of C on classification error rate and variational lower bound. Typically, as C gets lager, CMMVA learns more discriminative features and leads to a worse estimation of data likelihood. However, if C is too small, the supervision is not enough to lead to predictive features. Nevertheless, C = 103 is quite a good trade-off... We set C = 104 for our CMMVA model on SVHN by default. We use Ada M [10] to optimize parameters in all of the models. Although it is an adaptive gradient-based optimization method, we decay the global learning rate by factor three periodically after sufficient number of epochs to ensure a stable convergence. |