reproducibilityindex.ai

RLEG: Vision-Language Representation Learning with Diffusion-based Embedding Generation

Authors: Liming Zhao, Kecheng Zheng, Yun Zheng, Deli Zhao, Jingren Zhou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the proposed method could learn effective representation and achieve state-of-the-art performance on various tasks including image classification, image-text retrieval, object detection, semantic segmentation, and text-conditional image generation.
Researcher Affiliation	Industry	1Alibaba Group 2Ant Group.
Pseudocode	No	The paper describes the model and methods using prose and mathematical equations, but it does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using a 'publicly available reproduction repository (LAION-AI, 2022) of pre-training DALL-E 2 model', but this refers to a third-party code used by the authors, not the open-source release of the RLEG methodology itself.
Open Datasets	Yes	We train the proposed model on the dataset of YFCC-15M used in CLIP (Radford et al., 2021), a subset of YFCC100M (Thomee et al., 2016). ... We train the proposed model on a larger dataset LAION-400M (Schuhmann et al., 2021)...
Dataset Splits	No	The paper mentions 'validation' as part of an evaluation task and implicitly for hyperparameter setting ('The loss weight λ is empirically set to 0.1.'), but it does not specify a distinct validation dataset split with percentages or counts for reproducibility during training.
Hardware Specification	Yes	The model is trained from scratch for 32 epochs on 8 NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions various models (e.g., ResNet, ViT, BERT) and optimizers (AdamW) and sampling strategies (DDIM) but does not provide specific version numbers for any software, libraries, or programming languages used.
Experiment Setup	Yes	The learning rate is initially set to 5e 4 and decayed to zero with a cosine scheduler. A warm-up of the learning rate is used at the first 3 epochs. The weight decay for model parameters is 0.1. The model is trained from scratch for 32 epochs... The batch size is set to 512 for each GPU card and a total of 4096 in the experiments. ... The number of multiple samplings K is set to 4... The condition weight w during sampling is set to 2.0... The loss weight λ is empirically set to 0.1.