Population Matching Discrepancy and Applications in Deep Learning

Authors: Jianfei Chen, Chongxuan LI, Yizhong Ru, Jun Zhu

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that PMD overcomes the aforementioned drawbacks of MMD, and outperforms MMD on both tasks in terms of the performance as well as the convergence speed.
Researcher Affiliation Academia Jianfei Chen, Chongxuan Li, Yizhong Ru, Jun Zhu Dept. of Comp. Sci. & Tech., TNList Lab, State Key Lab for Intell. Tech. & Sys. Tsinghua University, Beijing, 100084, China {chenjian14,licx14,ruyz13}@mails.tsinghua.edu.cn, dcszj@tsinghua.edu.cn
Pseudocode Yes Figure 1: Pseudocode of PMD for parameter learning with graphical illustration of an iteration.
Open Source Code No The paper references a GitHub link for "Generative Moment Matching Networks" by Siddharth Agrawal [2], which is a third-party reference and not a link to the authors' own implementation code for the method described in this paper.
Open Datasets Yes We compare the performance of PMD and MMD on the standard Office [41] object recognition benchmark for domain adaptation. ... We compare PMD with MMD for image generation on the MNIST [28], SVHN [36] and LFW [20] dataset.
Dataset Splits Yes Following [8], we validate the domain regularization strength λ and the MMD kernel bandwidth σ on a random 100-sample labeled dataset on the target domain, but the model is trained without any labeled data from the target domain.
Hardware Specification Yes Our experiment is conducted on a machine with Nvidia Titan X (Pascal) GPU and Intel E5-2683v3 CPU.
Software Dependencies Yes We implement the models in Tensor Flow [1]. The CUDA program is compiled with nvcc 8.0 and the C++ program is compiled with g++ 4.8.4, while -O3 flag is used for both programs.
Experiment Setup Yes The classifier is a fully-connected neural network with a single hidden layer of 256 Re LU [15] units, trained with Ada Delta [51]. We apply batch normalization [21] on the hidden layer... We set the population size N = 2000 for both PMD and MMD, and the mini-batch size |B| = 100 for PMD. We use the Ada M optimizer [22] with batch normalization [21], and train the model for 100 epoches for PMD, and 500 epoches for MMD.