Decoupling Semantic Similarity from Spatial Alignment for Neural Networks.
Authors: Tassilo Wald, Constantin Ulrich, Priyank Jaini, Gregor Koehler, David Zimmerer, Stefan Denner, Fabian Isensee, Michael Baumgartner, Klaus Maier-Hein
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments: Semantic vs Spatio-Semantic RSMs", "To test the impact of the semantic RSMs in real-world applications, we now investigate the common task of image retrieval. Each entry in an RSM quantifies a sample-to-sample similarity value, which can be directly used for retrieval. While not specifically designed for it, we argue that better retrieval performance reflects a better inter-sample similarity. This allows us to quantify improvements in the RSM structure. To measure retrieval performance the Ego Objects dataset [35] is used. |
| Researcher Affiliation | Collaboration | Tassilo Wald ,1,2,3 , Constantin Ulrich1,4,7 , Gregor Köhler 1,3, David Zimmerer 1,2, Stefan Denner 1,3, Michael Baumgartner 1,3, Fabian Isensee 1,2, Priyank Jaini ,5, Klaus H. Maier-Hein ,1,2,3,4,6 1 Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany 2 Helmholtz Imaging, DKFZ, Heidelberg, Germany 3 Faculty of Mathematics and Computer Science, University of Heidelberg, Germany 4 Medical Faculty Heidelberg, University of Heidelberg, Germany 5 Google Deepmind 6 Pattern Analysis and Learning Group, Department of Radiation Oncology 7 National Center for Tumor Diseases (NCT) Heidelberg, Germany |
| Pseudocode | Yes | The pseudo-code for calculating semantic RSMs is visualized in the Appendix under Algorithm 1.", "In addition to our provided explanation of the algorithm in the main manuscript, we provide the pseudo-code used to compute semantic RSMs in Algorithm 1. |
| Open Source Code | Yes | The Code is available here. |
| Open Datasets | Yes | We utilize Tiny-Image Net to generate partially overlapping crops...", "To measure retrieval performance the Ego Objects dataset [35] is used.", "Consequently, we use various classifiers trained to predict Image Net1k from Huggingface and compare the Pearson correlation ρ between their JSD and the representational similarity of their last hidden layer. |
| Dataset Splits | Yes | For the retrieval experiment, we utilize the Ego Objects test set, which is comprised of 29.5K images... Of all remaining images, we then draw 2k query images and 5k database images used for extracting embeddings for similarity calculation and later retrieval. Naturally, we sample in a way to keep the 2k query and 5k database image sets non-overlapping.", "we use N=500 validation images as the query dataset and the remaining N=2975 training images as the database for retrieval |
| Hardware Specification | No | The average time taken per matching was then reported for the same single CPU core, as outlined in Table 5. |
| Software Dependencies | No | We utilize the implementation and weights provided by torchvision [17].", "Consequently, we use various classifiers trained to predict Image Net1k from Huggingface", "OR-Tools [23]", "Scipy [30] |
| Experiment Setup | Yes | Across all experiments we compare the linear kernel, the radial basis function (RBF) kernel, and the cosine similarity kernel, see Appendix A for details.", "For all RSMs, we retrieve the most similar image that is not part of the same video the same scene but different conditions are allowed.", "After extracting representations we calculate the mean from the database embeddings to zero-center all representations by, query and database representations alike.", "We use the radial basis function, as it provides bounded similarity values allowing a better visualization. As kernel, we use the radial basis function, as it provides bounded similarity values allowing a better visualization.", "For each pair of samples, we can compare how similar the predicted class probabilities of a model are and compare this to the representational similarity. A commonly used metric for this is the Jensen-Shannon Divergence (JSD)... Consequently, we use various classifiers trained to predict Image Net1k from Huggingface and compare the Pearson correlation ρ between their JSD and the representational similarity of their last hidden layer. |