Scene Graph Embeddings Using Relative Similarity Supervision

Authors: Paridhi Maheshwari, Ritwick Chaudhry, Vishwa Vinay2328-2336

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that this Ranking loss, coupled with an intuitive triple sampling strategy, leads to robust representations that outperform well-known contrastive losses on the retrieval task. The results are tabulated in Table 2 and we make the following observations: (a) Comparing across the 3 objective functions, it is evident that the proposed Ranking loss consistently outperforms the Triplet and Info NCE alternatives for any sampling.
Researcher Affiliation Collaboration Paridhi Maheshwari1*, Ritwick Chaudhry2* , Vishwa Vinay1 1 Adobe Research 2 Carnegie Mellon University parimahe@adobe.com, rchaudhr@andrew.cmu.edu, vinay@adobe.com
Pseudocode No The paper describes the model operations mathematically but does not provide structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Dataset: We work on the Visual Genome (Krishna et al. 2017) dataset which is a collection of 108, 077 images and their scene graphs.
Dataset Splits Yes We divide the data into train, validation and test sets with a 70 : 20 : 10 split.
Hardware Specification Yes Training is performed on a Ubuntu 16.01 machine, using a single Tesla V100 GPU and Py Torch framework.
Software Dependencies No The paper mentions 'Py Torch framework' but does not provide specific version numbers for software dependencies or libraries used in the experiment.
Experiment Setup Yes The model consists of 5 GCN layers and is trained using Adam optimizer (Kingma and Ba 2014) for 100 epochs with learning rate 10 4 and batch size 16. The temperature parameter in Info NCE and Ranking loss has been set as λ = 1 and ν = 1 and the margin in Triplet loss as m = 0.5. For all multilayer perceptrons, we use Re LU activation and batch normalization (Ioffe and Szegedy 2015).