Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improved Probabilistic Image-Text Representations

Authors: Sanghyuk Chun

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on MS-COCO Caption and two extended benchmarks, Cx C and ECCV Caption, demonstrate the effectiveness of PCME++ compared to state-of-the-art ITM methods.
Researcher Affiliation Industry Sanghyuk Chun NAVER AI Lab
Pseudocode Yes Figure A.2 shows the Py Torch style pseudo-code of PCME++. Note that ยต and ฯƒ are extracted from the augmented inputs, such as MSDA (Section 2.4) and Size Augment (Chen et al., 2021). 1 def compute_loss(v_mu, v_sig, t_mu, t_sig, matched):
Open Source Code Yes The code is available at https://github.com/naver-ai/pcmepp.
Open Datasets Yes Three evaluation benchmark datasets are used: COCO Caption (Chen et al., 2015), and its two extended benchmarks, ECCV Caption (EC) (Chun et al., 2022) and Cx C (Parekh et al., 2021).
Dataset Splits Yes MS-COCO Caption (Chen et al., 2015), a widely used ITM benchmark, containing 123,287 images from MS-COCO (Lin et al., 2014) and five human-annotated captions per image. 113,287/5,000/5,000 images are used for training/validation/testing (Karpathy & Fei-Fei, 2015).
Hardware Specification Yes PCME++ 25 epoch training takes 106,311 secs (1 day and 5 hours), while PCME 25 epoch training takes 141,694 secs (1 day and 15 hours) on a single V100 GPU. and Vi T B/32 1 V100 (38 hours) 8 V100 (17 hours) (Table B.1)
Software Dependencies No The paper mentions software like 'Adam P optimizer' and 'openclip software' but does not provide specific version numbers for these or other key dependencies.
Experiment Setup Yes All models are trained for 25 epochs using Adam P optimizer (Heo et al., 2021) by setting the initial learning rate as 0.0005 and weight decay as 0.0001. The learning rate is decayed by a factor of 0.1 for the last 10 epochs... The batch size is set to 128. The hyperparameters of PCME++ are set as follows; the affine transform is initialized by a = b = 5 in Equation (2); ฮฑ for pseudo-positives as 0.1; VIB ฮฒ as 0.0001. PCME++ mixes 25% of images in the mini-batch by Mixup or Cut Mix with a mixing ratio drawn from Beta(2, 2).