Multimodal Federated Learning via Contrastive Representation Ensemble

Authors: Qiying Yu, Yang Liu, Yimu Wang, Ke Xu, Jingjing Liu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Thorough evaluations and ablation studies on image-text retrieval and VQA tasks showcase the superiority of Cream FL over state-of-the-art FL methods.
Researcher Affiliation Academia Qiying Yu1,4, Yang Liu1,4 , Yimu Wang2, Ke Xu3, Jingjing Liu1 1 Institute for AI Industry Research, Tsinghua University 2 University of Waterloo 3 Carnegie Mellon University 4 Shanghai Artificial Intelligence Laboratory yuqy22@mails.tsinghua.edu.cn, {liuy03,jjliu}@air.tsinghua.edu.cn
Pseudocode Yes Algorithm 1: Cream FL algorithm.
Open Source Code No The paper does not provide an explicit statement or link for open-source code.
Open Datasets Yes We randomly choose a subset of MS-COCO (Lin et al., 2014) with 50,000 image-text pairs as public dataset. ... We distribute Flicker30K (Plummer et al., 2015) to 15 multimodal clients, CIFAR100 (Krizhevsky et al., 2009) to 10 uni-modal image clients, and AGNEWS (Zhang et al., 2015) to 10 uni-modal text clients...
Dataset Splits No The paper mentions training and test sets but does not explicitly specify a validation dataset split or how it's used for hyperparameter tuning.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory used for experiments.
Software Dependencies No The paper specifies models used (e.g., ResNet-101, BERT) and an optimizer (AdamP) but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes We choose Res Net-101 (He et al., 2016) and Res Net-18 as the server and client image models, respectively, and BERT (base) (Devlin et al., 2018) and GRU (Chung et al., 2014) as the text models. The representation dimension d is 512 for both image and text. We use Adam P optimizer with initial learning rate 0.0002 and cosine learning rate scheduler for server model.