reproducibilityindex.ai

VASR: Visual Analogies of Situation Recognition

Authors: Yonatan Bitton, Ron Yosef, Eliyahu Strugo, Dafna Shahaf, Roy Schwartz, Gabriel Stanovsky

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label 80% of the time (chance level 25%). Furthermore, we use human annotations to create a gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly ( 86%), but struggle with carefully chosen distractors ( 53%, compared to 90% human accuracy).
Researcher Affiliation	Academia	Yonatan Bitton, Ron Yosef, Eliyahu Strugo, Dafna Shahaf, Roy Schwartz, Gabriel Stanovsky The Hebrew University of Jerusalem {yonatan.bitton,ron.yosef,eli.strugo,dafna.shahaf,roy.schwartz1,gabriel.stanovsky}@mail.huji.ac.il
Pseudocode	No	The paper describes its methods in prose but does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Website: https://vasr-dataset.github.io/
Open Datasets	Yes	We start with the im Situ corpus (Yatskar, Zettlemoyer, and Farhadi 2016), which annotates frame roles in images. ... We used human annotators ( 3.5) to create gold-standard split, with 1,310, 160, 2,350 samples in the train, dev, test ( 3.4), respectively. ... We also publish the full generated data (over 500K analogies) to allow other custom splits.
Dataset Splits	Yes	We used human annotators ( 3.5) to create gold-standard split, with 1,310, 160, 2,350 samples in the train, dev, test ( 3.4), respectively. Next, we create a silver train of size 150,000 items and a silver dev set of size 2,249 items. ... Full statistics are presented in Table 2.
Hardware Specification	No	The paper does not specify the hardware (e.g., CPU, GPU models, cloud instances) used for running the experiments.
Software Dependencies	Yes	The exact versions we took are the largest pretrained versions available in timm library: Vi T Large patch32-384, Swin Large patch4 window7-224, Dei T Base patch16 384, Conv Ne Xt Large.
Experiment Setup	Yes	We use the Adam (Kingma and Ba 2015) optimizer, a learning rate of 0.001, batch size of 128, and train for 5 epochs.