reproducibilityindex.ai

Unifying Vision-Language Representation Space with Single-Tower Transformer

Authors: Jiho Jang, Chaerin Kong, DongHyeon Jeon, Seonhoon Kim, Nojun Kwak

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Thorough evaluations demonstrate the potential of a unified modality-agnostic VLP framework. Experiments Training Setup Datasets Following prior works (Li et al. 2021; Yang et al. 2022; Gan et al. 2020), we train One R on the combination of CC3M (Sharma et al. 2018), SBU Captions (Ordonez, Kulkarni, and Berg 2011), Visual Genome (Krishna et al. 2017) and COCO (Lin et al. 2014), which sums up to 4M images and 5.1M image-text pairs.
Researcher Affiliation	Collaboration	Jiho Jang1 , Chaerin Kong1 , Donghyeon Jeon2, Seonhoon Kim3 , Nojun Kwak1 1Seoul National University 2NAVER 3Coupang {geographic,veztylord,nojunk}@snu.ac.kr, donghyeon.jeon@navercorp.com, sekim625@coupang.com
Pseudocode	No	The paper provides mathematical formulations and diagrams but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information, such as a repository link or an explicit statement about code release, for the methodology described.
Open Datasets	Yes	Datasets Following prior works (Li et al. 2021; Yang et al. 2022; Gan et al. 2020), we train One R on the combination of CC3M (Sharma et al. 2018), SBU Captions (Ordonez, Kulkarni, and Berg 2011), Visual Genome (Krishna et al. 2017) and COCO (Lin et al. 2014), which sums up to 4M images and 5.1M image-text pairs.
Dataset Splits	No	The paper mentions training datasets and testing on subsets like MS-COCO (5K) and Imagenet/CIFAR100, but it does not explicitly provide specific dataset split information (e.g., percentages, sample counts) for training, validation, and test sets, nor does it detail cross-validation setups.
Hardware Specification	Yes	We train our model with 32 A100 GPUs for 40 epochs under PyTorch framework.
Software Dependencies	No	The paper mentions 'PyTorch framework' but does not specify a version number or other software dependencies with their versions.
Experiment Setup	No	We train our model with 32 A100 GPUs for 40 epochs under PyTorch framework. Details on hyperparameters are listed in the supplementary.