Zero Shot Learning with the Isoperimetric Loss
Authors: Shay Deutsch, Andrea Bertozzi, Stefano Soatto10704-10712
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Yet, the experiments indicate so. In some cases, it even outperformed methods that used human annotation for the unseen labels. At heart, we solve a topology estimation problem. We determine the connectivity between nodes of the visual embedding graph, which defines a topology in that space informed by the semantic representation of seen attributes. Much of the literature in this area focuses on what kind of graph signal (embedding, or descriptor) to attribute to the nodes, whereas the connectivity of the graph is decided a-priori. We focus on the complementary problem, which is to determine the graph connectivity and learn the graph weights. Unlike other approaches, the connectivity in our method is informed both by the value of the visual descriptors at the vertices, and the values of the semantic descriptors in the range space. Our framework allows us to use automated semantic representation to perform ZSL, resulting in a framework which is entirely free of human annotation. |
| Researcher Affiliation | Academia | 1Department of Mathematics, 2Department of Computer Science University of California, Los Angeles {shaydeu, bertozzi}@math.ucla.edu,1 soatto@cs.ucla.edu2 |
| Pseudocode | Yes | Algorithm 1: Learning the Graph Connectivity Structure |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it provide a link to a code repository or mention code in supplementary materials. |
| Open Datasets | Yes | We use the two most common benchmarks for ZSL, Aw A and CUB. Aw A (Animals with Attributes) consists of 30,745 images of 50 classes of animals and has a source/target split of 40 and 10 classes, respectively. In addition we test on the new released dataset AWA2 which consists of 37,322 images of 50 classes which is an extension of Aw A (which will be refereed from now and on Aw A1). Aw A2 also has source/target of 40 and 10 classes respectively with a number of 7913 unseen testing classes. We used the proposed new splits for Aw A1 and Aw A2 (Xian, Schiele, and Akata 2018). The CUB dataset contains 200 different bird classes, with 11,788 images in total. |
| Dataset Splits | Yes | In Aw A (Lampert, Nickisch, and Harmeling 2009) there are 50 classes, of which we consider 40 as seen and sequester nu = 10 as unseen. In CUB (Welinder et al. 2010) there are 200 classes, of which we consider 150 as seen and the rest unseen. We use the standard split (Changpinyo et al. 2016) with 150 classes for training and 50 disjoint classes for testing (Xian, Schiele, and Akata 2018) which is employed in most automated based methods we compare to, while (Xian, Schiele, and Akata 2018) also suggested a new split for the CUB dataset. |
| Hardware Specification | Yes | The execution time of our code implementation using Intel Core i7 7700 Quad-Core 3.6GHz with 64B Memory on the AWA1 dataset with 5685 points using k = 10 nearest neighbor graph takes 21.9 seconds for the initialization of the visual-semantic embedding space, and 44.8 seconds for our IPL regularization. |
| Software Dependencies | No | The paper mentions software and tools like “Word2Vec” and “Res Net101”, but it does not specify any version numbers for these or any other software components (e.g., programming languages, libraries, frameworks) required for reproducibility. |
| Experiment Setup | Yes | For all the splits of Aw A and CUB datasets, we fix k = 15, r = 3, and k = 8, r = 3 for the k nearest neighbor graph parameter and radius r of the ball around each point, respectively. The edges wij are chosen using the cosine similarity between the visual observations. |