reproducibilityindex.ai

Max-Margin Deep Generative Models

Authors: Chongxuan Li, Jun Zhu, Tianlin Shi, Bo Zhang

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on MNIST and SVHN datasets demonstrate that (1) maxmargin learning can signiﬁcantly improve the prediction performance of DGMs and meanwhile retain the generative ability; and (2) mm DGMs are competitive to the state-of-the-art fully discriminative networks by employing deep convolutional neural networks (CNNs) as both recognition and generative models.
Researcher Affiliation	Academia	Dept. of Comp. Sci. & Tech., State Key Lab of Intell. Tech. & Sys., TNList Lab, Center for Bio-Inspired Computing Research, Tsinghua University, Beijing, 100084, China Dept. of Comp. Sci., Stanford University, Stanford, CA 94305, USA {licx14@mails., dcszj@, dcszb@}tsinghua.edu.cn; stl501@gmail.com
Pseudocode	Yes	Algorithm 1 Doubly Stochastic Subgradient Algorithm
Open Source Code	Yes	1The source code is available at https://github.com/zhenxuan00/mmdgm.
Open Datasets	Yes	We now present experimental results on the widely adopted MNIST [14] and SVHN [22] datasets.
Dataset Splits	Yes	MNIST... which consists of images of 10 different classes (0 to 9) of size 28 28 with 50,000 training samples, 10,000 validating samples and 10,000 testing samples. SVHN [22] is a large dataset consisting of color images of size 32 32. The task is to recognize center digits in natural scene images, which is signiﬁcantly harder than classiﬁcation of hand-written digits. We follow the work [27, 8] to split the dataset into 598,388 training data, 6000 validating data and 26, 032 testing data and preprocess the data by Local Contrast Normalization (LCN).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running the experiments.
Software Dependencies	No	We implement all experiments based on Theano [2]. This mentions a software component but does not specify its version number, nor does it list multiple software components with versions.
Experiment Setup	Yes	We choose C = 15 for MMVA... In the CNN case, we use 60,000 training data. Table 2 shows the effect of C on classiﬁcation error rate and variational lower bound. Typically, as C gets lager, CMMVA learns more discriminative features and leads to a worse estimation of data likelihood. However, if C is too small, the supervision is not enough to lead to predictive features. Nevertheless, C = 103 is quite a good trade-off... We set C = 104 for our CMMVA model on SVHN by default. We use Ada M [10] to optimize parameters in all of the models. Although it is an adaptive gradient-based optimization method, we decay the global learning rate by factor three periodically after sufﬁcient number of epochs to ensure a stable convergence.