GeomCA: Geometric Evaluation of Data Representations

Authors: Petra Poklukar, Anastasiia Varava, Danica Kragic

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate its applicability by analyzing representations obtained from a variety of scenarios, such as contrastive learning models, generative models and supervised learning models. We apply Geom CA to different practical setups. First, we consider a contrastive learning scenario and evaluate the structural similarity between the encodings belonging to different classes of the training and validation datasets (Section 4). Second, we evaluate generative models by comparing the connected components of the training and generated datasets (Section 5). Finally, we apply Geom CA to investigate if features extracted by a supervised model are separated according to their respective classes (Section 6).
Researcher Affiliation Academia Petra Poklukar 1 Anastasia Varava 1 Danica Kragic 1 1KTH Royal Institute of Technology, Stockholm, Sweden.
Pseudocode Yes Algorithm 1 Geom CA
Open Source Code Yes Our code is available on Git Hub2. 2https://github.com/petrapoklukar/Geom CA
Open Datasets Yes We used a Style GAN trained on FFHQ dataset (Karras et al., 2019) and replicated the truncation experiment as performed in (Kynk a anniemi et al., 2019). We applied Geom CA to VGG16 representations of the Image Net dataset. We evaluated two models for learning contrastive representations, Siamese and Sim CLR, on an image dataset introduced by (Chamzas et al., 2020).
Dataset Splits No Each dataset consists of 5000 training images and 5000 test images not used during training. No explicit mention of a separate validation dataset or split percentages for it was found.
Hardware Specification No The paper mentions libraries used for implementation and discusses computational efficiency, implying hardware capabilities, but does not provide specific details on the CPU, GPU, memory, or computing environment used for experiments.
Software Dependencies No We implemented Geom CA described in Algorithm 1 using GUDHI library (The GUDHI Project, 2020) which supports efficient computation of geometric sparsification, and Networkx library (Hagberg et al., 2008) for building and analyzing ε-graphs. While GUDHI's citation includes a version (3.4.0), Networkx does not, meaning not all key software dependencies are provided with specific version numbers.
Experiment Setup Yes Moreover, we used δ = ε / 2 to allow the homogeneous clusters also forming a component (see discussion in Sections 3.1 and 3.2), and chose ηc = 0.75, ηq = 0.45 in order to analyze only consistent components of high quality. Due to large dimensionality of the representations, we chose ε = ε(10), and ηc, ηq = 0. Since the generated representations E are in an ideal case well aligned with R, we chose δ = ε.