The Shape of Data: Intrinsic Distance for Data Distributions
Authors: Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, Emmanuel Mueller
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a thorough experimental study, we demonstrate that our method effectively discerns the structure of data manifolds even on unaligned data of different dimensionality, and showcase its efficacy in evaluating the quality of generative models. |
| Researcher Affiliation | Academia | University of Bonn Skoltech Aarhus University Technion |
| Pseudocode | Yes | Algorithm 1 IMD algorithm. |
| Open Source Code | Yes | Our code is available open-source: https://github.com/xgfs/imd. |
| Open Datasets | Yes | We train 10 instances of the VGG-16 (Simonyan & Zisserman, 2015) network using different weight initializations on the CIFAR-10 and CIFAR-100 datasets. We then train the WGAN (Arjovsky et al., 2017) and WGAN-GP (Gulrajani et al., 2017) models on four datasets: MNIST, Fashion MNIST, CIFAR10 and Celeb A. |
| Dataset Splits | No | Figure 4 (right) plots the VGG16 validation errors and IMD scores relative to the final layer representations of two pretrained networks, VGG16 itself with last layer dimension d = 512 and Res Net-20 with d = 64 and 50 times less parameters. |
| Hardware Specification | Yes | We train all our models on a single server with NVIDIA V100 GPU with 16Gb memory and 2 20 core Intel E5-2698 v4 CPU. |
| Software Dependencies | No | We use gensim (Rehurek & Sojka, 2010) to learn word vectors on the latest Wikipedia corpus snapshot on 16 languages... |
| Experiment Setup | Yes | We train each of the GANs for 200 epochs on MNIST, FMNIST and CIFAR-10, and for 50 epochs on Celeb A dataset. For WGAN we use RMSprop optimizer with learning rate of 5 10-5. For WGAN-GP we use Adam optimizer with learning rate of 10-4, β1 = 0.9, β2 = 0.999. |