Population Matching Discrepancy and Applications in Deep Learning
Authors: Jianfei Chen, Chongxuan LI, Yizhong Ru, Jun Zhu
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that PMD overcomes the aforementioned drawbacks of MMD, and outperforms MMD on both tasks in terms of the performance as well as the convergence speed. |
| Researcher Affiliation | Academia | Jianfei Chen, Chongxuan Li, Yizhong Ru, Jun Zhu Dept. of Comp. Sci. & Tech., TNList Lab, State Key Lab for Intell. Tech. & Sys. Tsinghua University, Beijing, 100084, China {chenjian14,licx14,ruyz13}@mails.tsinghua.edu.cn, dcszj@tsinghua.edu.cn |
| Pseudocode | Yes | Figure 1: Pseudocode of PMD for parameter learning with graphical illustration of an iteration. |
| Open Source Code | No | The paper references a GitHub link for "Generative Moment Matching Networks" by Siddharth Agrawal [2], which is a third-party reference and not a link to the authors' own implementation code for the method described in this paper. |
| Open Datasets | Yes | We compare the performance of PMD and MMD on the standard Office [41] object recognition benchmark for domain adaptation. ... We compare PMD with MMD for image generation on the MNIST [28], SVHN [36] and LFW [20] dataset. |
| Dataset Splits | Yes | Following [8], we validate the domain regularization strength λ and the MMD kernel bandwidth σ on a random 100-sample labeled dataset on the target domain, but the model is trained without any labeled data from the target domain. |
| Hardware Specification | Yes | Our experiment is conducted on a machine with Nvidia Titan X (Pascal) GPU and Intel E5-2683v3 CPU. |
| Software Dependencies | Yes | We implement the models in Tensor Flow [1]. The CUDA program is compiled with nvcc 8.0 and the C++ program is compiled with g++ 4.8.4, while -O3 flag is used for both programs. |
| Experiment Setup | Yes | The classifier is a fully-connected neural network with a single hidden layer of 256 Re LU [15] units, trained with Ada Delta [51]. We apply batch normalization [21] on the hidden layer... We set the population size N = 2000 for both PMD and MMD, and the mini-batch size |B| = 100 for PMD. We use the Ada M optimizer [22] with batch normalization [21], and train the model for 100 epoches for PMD, and 500 epoches for MMD. |