Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning

Authors: Iro Laina, Ruth Fong, Andrea Vedaldi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments are organized as follows. First, we examine the representations learned by two state-of-the-art approaches, namely Se La [3] and Mo Co [27], and use our learnability metric (Eq. (1)) to quantify the semantic coherence of their learned representations. We then repeat theses experiments by providing human-annotated, class-level descriptions to measure the respective describability.
Researcher Affiliation Academia Visual Geometry Group University of Oxford {iro, ruthfong, vedaldi}@robots.ox.ac.uk
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using publicly available implementations of other models and tools, but does not provide an explicit statement or link to the open-source code for the methodology described in this paper.
Open Datasets Yes We use data from the training set of Image Net [17].
Dataset Splits No The paper mentions using the 'training set of Image Net [17]' and evaluating on '20 selected Image Net classes', but it does not provide specific percentages or counts for training, validation, and test splits required for reproducibility.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or cloud computing specifications used for running its experiments.
Software Dependencies No The paper mentions using 'Res Net-50', 'Sentence BERT [59]', 'BERT-large as the backbone', and refers to a 'publicly available implementation' (Att2in), but it does not list specific version numbers for these or other software libraries/frameworks crucial for reproducibility.
Experiment Setup Yes For the semantic coherence experiments, each HIT consists of a reference set of 10 example images randomly sampled from the class and two query images. To obtain X Mo Co c , we apply k-means on top of Mo Co-v1 feature vectors (obtained using the official implementation) and set k = 3000 for a fair comparison with [3]. We then extract 1024-dimensional caption embeddings using Sentence BERT [59] (with BERT-large as the backbone).