VASR: Visual Analogies of Situation Recognition

Authors: Yonatan Bitton, Ron Yosef, Eliyahu Strugo, Dafna Shahaf, Roy Schwartz, Gabriel Stanovsky

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label 80% of the time (chance level 25%). Furthermore, we use human annotations to create a gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly ( 86%), but struggle with carefully chosen distractors ( 53%, compared to 90% human accuracy).
Researcher Affiliation Academia Yonatan Bitton, Ron Yosef, Eliyahu Strugo, Dafna Shahaf, Roy Schwartz, Gabriel Stanovsky The Hebrew University of Jerusalem {yonatan.bitton,ron.yosef,eli.strugo,dafna.shahaf,roy.schwartz1,gabriel.stanovsky}@mail.huji.ac.il
Pseudocode No The paper describes its methods in prose but does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes Website: https://vasr-dataset.github.io/
Open Datasets Yes We start with the im Situ corpus (Yatskar, Zettlemoyer, and Farhadi 2016), which annotates frame roles in images. ... We used human annotators ( 3.5) to create gold-standard split, with 1,310, 160, 2,350 samples in the train, dev, test ( 3.4), respectively. ... We also publish the full generated data (over 500K analogies) to allow other custom splits.
Dataset Splits Yes We used human annotators ( 3.5) to create gold-standard split, with 1,310, 160, 2,350 samples in the train, dev, test ( 3.4), respectively. Next, we create a silver train of size 150,000 items and a silver dev set of size 2,249 items. ... Full statistics are presented in Table 2.
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, cloud instances) used for running the experiments.
Software Dependencies Yes The exact versions we took are the largest pretrained versions available in timm library: Vi T Large patch32-384, Swin Large patch4 window7-224, Dei T Base patch16 384, Conv Ne Xt Large.
Experiment Setup Yes We use the Adam (Kingma and Ba 2015) optimizer, a learning rate of 0.001, batch size of 128, and train for 5 epochs.