Alt-Text with Context: Improving Accessibility for Images on Twitter

Authors: Nikita Srivatsan, Sofia Samaniego, Omar Florez, Taylor Berg-Kirkpatrick

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a quantitative analysis of this approach on our collected dataset, outperforming prior work on captioning (Clip Cap, Mokady et al. (2021)) and vision-and-language pretraining (BLIP-2, Li et al. (2023)) by more than 2x on BLEU@4. We also conduct human evaluation which substantiates these findings.
Researcher Affiliation Academia Nikita Srivatsan Carnegie Mellon University nsrivats@cmu.edu Sof ıa Samaniego sofia.samaniego.f@gmail.com Omar Florez Latin X in AI omarflorez.research@gmail.com Taylor Berg-Kirkpatrick University of California San Diego tberg@ucsd.edu
Pseudocode No No pseudocode or algorithm blocks are provided in the paper.
Open Source Code No While the raw data cannot be distributed directly in order to protect users right to be forgotten, we release the dehydrated tweet IDs and media URLs 2. 2https://github.com/Nikita Srivatsan/Alt Text Public ICLR. This link is for the dataset, not the source code of the methodology.
Open Datasets Yes To support continued research in this area, we collect and release a first-of-its-kind dataset of image and alt-text pairs scraped from Twitter. While the raw data cannot be distributed directly in order to protect users right to be forgotten, we release the dehydrated tweet IDs and media URLs 2. 2https://github.com/Nikita Srivatsan/Alt Text Public ICLR
Dataset Splits Yes This yielded a dataset of 371,270 images. We split into train (330,449), val (20,561), and test (20,260) sets based on tweet ID (i.e. images from the same tweet are assigned to the same split) to prevent leakage of tweet text.
Hardware Specification Yes Our prefixes are of size k = 10. This allows training to fit on a single A6000 GPU.
Software Dependencies No Our implementation is written in Py Torch (Paszke et al., 2019) and inherits some code from the Clip Cap (Mokady et al., 2021) repository. (No version number for PyTorch is given, nor for any other specific software dependencies).
Experiment Setup Yes We train our models with a batch size of 100 and an initial learning rate of 1e 4 using the Adam optimization algorithm (Kingma & Ba, 2015).