reproducibilityindex.ai

Alt-Text with Context: Improving Accessibility for Images on Twitter

Authors: Nikita Srivatsan, Sofia Samaniego, Omar Florez, Taylor Berg-Kirkpatrick

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a quantitative analysis of this approach on our collected dataset, outperforming prior work on captioning (Clip Cap, Mokady et al. (2021)) and vision-and-language pretraining (BLIP-2, Li et al. (2023)) by more than 2x on BLEU@4. We also conduct human evaluation which substantiates these findings.
Researcher Affiliation	Academia	Nikita Srivatsan Carnegie Mellon University nsrivats@cmu.edu Sof ıa Samaniego sofia.samaniego.f@gmail.com Omar Florez Latin X in AI omarflorez.research@gmail.com Taylor Berg-Kirkpatrick University of California San Diego tberg@ucsd.edu
Pseudocode	No	No pseudocode or algorithm blocks are provided in the paper.
Open Source Code	No	While the raw data cannot be distributed directly in order to protect users right to be forgotten, we release the dehydrated tweet IDs and media URLs 2. 2https://github.com/Nikita Srivatsan/Alt Text Public ICLR. This link is for the dataset, not the source code of the methodology.
Open Datasets	Yes	To support continued research in this area, we collect and release a first-of-its-kind dataset of image and alt-text pairs scraped from Twitter. While the raw data cannot be distributed directly in order to protect users right to be forgotten, we release the dehydrated tweet IDs and media URLs 2. 2https://github.com/Nikita Srivatsan/Alt Text Public ICLR
Dataset Splits	Yes	This yielded a dataset of 371,270 images. We split into train (330,449), val (20,561), and test (20,260) sets based on tweet ID (i.e. images from the same tweet are assigned to the same split) to prevent leakage of tweet text.
Hardware Specification	Yes	Our prefixes are of size k = 10. This allows training to fit on a single A6000 GPU.
Software Dependencies	No	Our implementation is written in Py Torch (Paszke et al., 2019) and inherits some code from the Clip Cap (Mokady et al., 2021) repository. (No version number for PyTorch is given, nor for any other specific software dependencies).
Experiment Setup	Yes	We train our models with a batch size of 100 and an initial learning rate of 1e 4 using the Adam optimization algorithm (Kingma & Ba, 2015).