Learning Discrete Representations via Information Maximizing Self-Augmented Training

Authors: Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets show that IMSAT produces state-of-the-art results for both clustering and unsupervised hash learning.
Researcher Affiliation Collaboration 1University of Tokyo, Japan 2RIKEN AIP, Japan 3Preferred Networks, Inc., Japan 4ATR Cognitive Mechanism Laboratories, Japan.
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code Yes Our implementation based on Chainer (Tokui et al., 2015) is available at https://github.com/weihua916/imsat.
Open Datasets Yes MNIST (Le Cun et al., 1998), Omniglot (Lake et al., 2011), STL (Coates et al., 2010), CIFAR10 (Torralba et al., 2008), CIFAR100 (Torralba et al., 2008), SVHN (Netzer et al., 2011), Reuters (Lewis et al., 2004), 20news (Lang, 1995).
Dataset Splits No The paper does not explicitly provide specific train/validation/test dataset splits with percentages, sample counts, or clear partitioning methodology. While it mentions 'cross-validation' and 'query data' vs 'gallery set', it lacks the detailed splits for reproducibility of the training process across all experiments.
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory specifications, or cloud instances) for running its experiments.
Software Dependencies No The paper mentions 'Chainer' as the framework used for implementation, but it does not specify the version number for Chainer or any other software dependencies with their specific versions.
Experiment Setup Yes Specifically, inspired by Hinton et al. (2012), we set the network dimensionality to d-1200-1200-M for clustering across all the datasets, where d and M are input and output dimensionality, respectively. ... For the output layer, we used the softmax for clustering and the sigmoids for hash learning. Regarding optimization, we used Adam (Kingma & Ba, 2015) with the step size 0.002. ... we set the perturbation range, ϵ, on data point x in VAT and RPT as ϵ(x) = α σt(x), where α is a scalar and σt(x) is the Euclidian distance to the t-th neighbor of x. In our experiments, we fixed t = 10. For Linear IMSAT (VAT), IMSAT (RPT) and IMSAT (VAT), we fixed α = 0.4, 2.5 and 0.25, respectively... we chose 0.005 for decay rates in both Linear RIM and Deep RIM. Also, we set λ = 1.6, 0.05 and 0.1 for Linear IMSAT (VAT), IMSAT (RPT) and IMSAT (VAT), respectively.