VASR: Visual Analogies of Situation Recognition
Authors: Yonatan Bitton, Ron Yosef, Eliyahu Strugo, Dafna Shahaf, Roy Schwartz, Gabriel Stanovsky
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label 80% of the time (chance level 25%). Furthermore, we use human annotations to create a gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly ( 86%), but struggle with carefully chosen distractors ( 53%, compared to 90% human accuracy). |
| Researcher Affiliation | Academia | Yonatan Bitton, Ron Yosef, Eliyahu Strugo, Dafna Shahaf, Roy Schwartz, Gabriel Stanovsky The Hebrew University of Jerusalem {yonatan.bitton,ron.yosef,eli.strugo,dafna.shahaf,roy.schwartz1,gabriel.stanovsky}@mail.huji.ac.il |
| Pseudocode | No | The paper describes its methods in prose but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Website: https://vasr-dataset.github.io/ |
| Open Datasets | Yes | We start with the im Situ corpus (Yatskar, Zettlemoyer, and Farhadi 2016), which annotates frame roles in images. ... We used human annotators ( 3.5) to create gold-standard split, with 1,310, 160, 2,350 samples in the train, dev, test ( 3.4), respectively. ... We also publish the full generated data (over 500K analogies) to allow other custom splits. |
| Dataset Splits | Yes | We used human annotators ( 3.5) to create gold-standard split, with 1,310, 160, 2,350 samples in the train, dev, test ( 3.4), respectively. Next, we create a silver train of size 150,000 items and a silver dev set of size 2,249 items. ... Full statistics are presented in Table 2. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., CPU, GPU models, cloud instances) used for running the experiments. |
| Software Dependencies | Yes | The exact versions we took are the largest pretrained versions available in timm library: Vi T Large patch32-384, Swin Large patch4 window7-224, Dei T Base patch16 384, Conv Ne Xt Large. |
| Experiment Setup | Yes | We use the Adam (Kingma and Ba 2015) optimizer, a learning rate of 0.001, batch size of 128, and train for 5 epochs. |