Alt-Text with Context: Improving Accessibility for Images on Twitter
Authors: Nikita Srivatsan, Sofia Samaniego, Omar Florez, Taylor Berg-Kirkpatrick
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a quantitative analysis of this approach on our collected dataset, outperforming prior work on captioning (Clip Cap, Mokady et al. (2021)) and vision-and-language pretraining (BLIP-2, Li et al. (2023)) by more than 2x on BLEU@4. We also conduct human evaluation which substantiates these findings. |
| Researcher Affiliation | Academia | Nikita Srivatsan Carnegie Mellon University nsrivats@cmu.edu Sof ıa Samaniego sofia.samaniego.f@gmail.com Omar Florez Latin X in AI omarflorez.research@gmail.com Taylor Berg-Kirkpatrick University of California San Diego tberg@ucsd.edu |
| Pseudocode | No | No pseudocode or algorithm blocks are provided in the paper. |
| Open Source Code | No | While the raw data cannot be distributed directly in order to protect users right to be forgotten, we release the dehydrated tweet IDs and media URLs 2. 2https://github.com/Nikita Srivatsan/Alt Text Public ICLR. This link is for the dataset, not the source code of the methodology. |
| Open Datasets | Yes | To support continued research in this area, we collect and release a first-of-its-kind dataset of image and alt-text pairs scraped from Twitter. While the raw data cannot be distributed directly in order to protect users right to be forgotten, we release the dehydrated tweet IDs and media URLs 2. 2https://github.com/Nikita Srivatsan/Alt Text Public ICLR |
| Dataset Splits | Yes | This yielded a dataset of 371,270 images. We split into train (330,449), val (20,561), and test (20,260) sets based on tweet ID (i.e. images from the same tweet are assigned to the same split) to prevent leakage of tweet text. |
| Hardware Specification | Yes | Our prefixes are of size k = 10. This allows training to fit on a single A6000 GPU. |
| Software Dependencies | No | Our implementation is written in Py Torch (Paszke et al., 2019) and inherits some code from the Clip Cap (Mokady et al., 2021) repository. (No version number for PyTorch is given, nor for any other specific software dependencies). |
| Experiment Setup | Yes | We train our models with a batch size of 100 and an initial learning rate of 1e 4 using the Adam optimization algorithm (Kingma & Ba, 2015). |