reproducibilityindex.ai

The Shape of Data: Intrinsic Distance for Data Distributions

Authors: Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, Emmanuel Mueller

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In a thorough experimental study, we demonstrate that our method effectively discerns the structure of data manifolds even on unaligned data of different dimensionality, and showcase its efﬁcacy in evaluating the quality of generative models.
Researcher Affiliation	Academia	University of Bonn Skoltech Aarhus University Technion
Pseudocode	Yes	Algorithm 1 IMD algorithm.
Open Source Code	Yes	Our code is available open-source: https://github.com/xgfs/imd.
Open Datasets	Yes	We train 10 instances of the VGG-16 (Simonyan & Zisserman, 2015) network using different weight initializations on the CIFAR-10 and CIFAR-100 datasets. We then train the WGAN (Arjovsky et al., 2017) and WGAN-GP (Gulrajani et al., 2017) models on four datasets: MNIST, Fashion MNIST, CIFAR10 and Celeb A.
Dataset Splits	No	Figure 4 (right) plots the VGG16 validation errors and IMD scores relative to the ﬁnal layer representations of two pretrained networks, VGG16 itself with last layer dimension d = 512 and Res Net-20 with d = 64 and 50 times less parameters.
Hardware Specification	Yes	We train all our models on a single server with NVIDIA V100 GPU with 16Gb memory and 2 20 core Intel E5-2698 v4 CPU.
Software Dependencies	No	We use gensim (Rehurek & Sojka, 2010) to learn word vectors on the latest Wikipedia corpus snapshot on 16 languages...
Experiment Setup	Yes	We train each of the GANs for 200 epochs on MNIST, FMNIST and CIFAR-10, and for 50 epochs on Celeb A dataset. For WGAN we use RMSprop optimizer with learning rate of 5 10-5. For WGAN-GP we use Adam optimizer with learning rate of 10-4, β1 = 0.9, β2 = 0.999.