GeomCA: Geometric Evaluation of Data Representations
Authors: Petra Poklukar, Anastasiia Varava, Danica Kragic
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its applicability by analyzing representations obtained from a variety of scenarios, such as contrastive learning models, generative models and supervised learning models. We apply Geom CA to different practical setups. First, we consider a contrastive learning scenario and evaluate the structural similarity between the encodings belonging to different classes of the training and validation datasets (Section 4). Second, we evaluate generative models by comparing the connected components of the training and generated datasets (Section 5). Finally, we apply Geom CA to investigate if features extracted by a supervised model are separated according to their respective classes (Section 6). |
| Researcher Affiliation | Academia | Petra Poklukar 1 Anastasia Varava 1 Danica Kragic 1 1KTH Royal Institute of Technology, Stockholm, Sweden. |
| Pseudocode | Yes | Algorithm 1 Geom CA |
| Open Source Code | Yes | Our code is available on Git Hub2. 2https://github.com/petrapoklukar/Geom CA |
| Open Datasets | Yes | We used a Style GAN trained on FFHQ dataset (Karras et al., 2019) and replicated the truncation experiment as performed in (Kynk a anniemi et al., 2019). We applied Geom CA to VGG16 representations of the Image Net dataset. We evaluated two models for learning contrastive representations, Siamese and Sim CLR, on an image dataset introduced by (Chamzas et al., 2020). |
| Dataset Splits | No | Each dataset consists of 5000 training images and 5000 test images not used during training. No explicit mention of a separate validation dataset or split percentages for it was found. |
| Hardware Specification | No | The paper mentions libraries used for implementation and discusses computational efficiency, implying hardware capabilities, but does not provide specific details on the CPU, GPU, memory, or computing environment used for experiments. |
| Software Dependencies | No | We implemented Geom CA described in Algorithm 1 using GUDHI library (The GUDHI Project, 2020) which supports efficient computation of geometric sparsification, and Networkx library (Hagberg et al., 2008) for building and analyzing ε-graphs. While GUDHI's citation includes a version (3.4.0), Networkx does not, meaning not all key software dependencies are provided with specific version numbers. |
| Experiment Setup | Yes | Moreover, we used δ = ε / 2 to allow the homogeneous clusters also forming a component (see discussion in Sections 3.1 and 3.2), and chose ηc = 0.75, ηq = 0.45 in order to analyze only consistent components of high quality. Due to large dimensionality of the representations, we chose ε = ε(10), and ηc, ηq = 0. Since the generated representations E are in an ideal case well aligned with R, we chose δ = ε. |