Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships

Authors: ABHRA CHAUDHURI, Massimiliano Mancini, Zeynep Akata, Anjan Dutta

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We begin by theoretically showing that abstract relational representations are nothing but a way of recovering transitive relationships among local views. Based on this, we design Transitivity Recovering Decompositions (TRD), a graphspace search algorithm that identifies interpretable equivalents of abstract emergent relationships at both instance and class levels, and with no post-hoc computations. We additionally show that TRD is provably robust to noisy views, with empirical evidence also supporting this finding. The latter allows TRD to perform at par or even better than the state-of-the-art, while being fully interpretable.
Researcher Affiliation Academia 1 University of Exeter 2 University of Trento 3 University of Tübingen 4 MPI for Informatics 5 The Alan Turing Institute 6 University of Surrey
Pseudocode No The paper describes methods in text and uses mathematical formulations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Implementation is available at https://github.com/abhrac/trd.
Open Datasets Yes To verify the generalizability and scalability of our method, we evaluate it on small, medium and large-scale FGVC benchmarks. We perform small-scale evaluation on the Soy and Cotton Cultivar datasets [89], while we choose FGVC Aircraft [53], Stanford Cars [42], CUB [80], and NA Birds [75] for medium scale evaluation. For large-scale evaluation, we choose the i Naturalist dataset [76], which has over 675K train set and 182K test set images.
Dataset Splits Yes For large-scale evaluation, we choose the i Naturalist dataset [76], which has over 675K train set and 182K test set images. The concept graph is obtained by performing an online clustering on the node and edge embeddings of the training instances of the corresponding class.
Hardware Specification Yes We implement TRD with a single NVIDIA Ge Force RTX 3090 GPU, an 8-core Intel Xeon processor, and 32 GBs of RAM.
Software Dependencies No The paper mentions software like ResNet50 and GAT, but does not provide specific version numbers for these or other key software components (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup Yes For obtaining the global view, we follow [84, 94] by selecting the smallest bounding box containing the largest connected component of the thresholded final layer feature map obtained from an Image Net-1K [17] pre-trained Res Net50 [30], which we also use as the relation-agnostic encoder f. The global view is resized to 224 224. We then obtain 64 local views by randomly cropping 28 28 regions within the global crop and resizing them to 224 224. We use a 8-layer Graph Attention Network (GAT) ([77], with 4 attention heads in each hidden layer, and normalized via Graph Norm [7] to obtain the Semantic Relevance Graph. We train TRD for 1000 epochs using the Adam optimizer, at an initial learning rate of 0.005 (decayed by a factor of 0.1 every 100 epochs), with a weight decay of 5 10 4. We generally followed [77, 10] for choosing the above settings.