Learning Discrete Representations via Information Maximizing Self-Augmented Training
Authors: Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmark datasets show that IMSAT produces state-of-the-art results for both clustering and unsupervised hash learning. |
| Researcher Affiliation | Collaboration | 1University of Tokyo, Japan 2RIKEN AIP, Japan 3Preferred Networks, Inc., Japan 4ATR Cognitive Mechanism Laboratories, Japan. |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | Yes | Our implementation based on Chainer (Tokui et al., 2015) is available at https://github.com/weihua916/imsat. |
| Open Datasets | Yes | MNIST (Le Cun et al., 1998), Omniglot (Lake et al., 2011), STL (Coates et al., 2010), CIFAR10 (Torralba et al., 2008), CIFAR100 (Torralba et al., 2008), SVHN (Netzer et al., 2011), Reuters (Lewis et al., 2004), 20news (Lang, 1995). |
| Dataset Splits | No | The paper does not explicitly provide specific train/validation/test dataset splits with percentages, sample counts, or clear partitioning methodology. While it mentions 'cross-validation' and 'query data' vs 'gallery set', it lacks the detailed splits for reproducibility of the training process across all experiments. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory specifications, or cloud instances) for running its experiments. |
| Software Dependencies | No | The paper mentions 'Chainer' as the framework used for implementation, but it does not specify the version number for Chainer or any other software dependencies with their specific versions. |
| Experiment Setup | Yes | Specifically, inspired by Hinton et al. (2012), we set the network dimensionality to d-1200-1200-M for clustering across all the datasets, where d and M are input and output dimensionality, respectively. ... For the output layer, we used the softmax for clustering and the sigmoids for hash learning. Regarding optimization, we used Adam (Kingma & Ba, 2015) with the step size 0.002. ... we set the perturbation range, ϵ, on data point x in VAT and RPT as ϵ(x) = α σt(x), where α is a scalar and σt(x) is the Euclidian distance to the t-th neighbor of x. In our experiments, we fixed t = 10. For Linear IMSAT (VAT), IMSAT (RPT) and IMSAT (VAT), we fixed α = 0.4, 2.5 and 0.25, respectively... we chose 0.005 for decay rates in both Linear RIM and Deep RIM. Also, we set λ = 1.6, 0.05 and 0.1 for Linear IMSAT (VAT), IMSAT (RPT) and IMSAT (VAT), respectively. |