Multimodal Federated Learning via Contrastive Representation Ensemble
Authors: Qiying Yu, Yang Liu, Yimu Wang, Ke Xu, Jingjing Liu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Thorough evaluations and ablation studies on image-text retrieval and VQA tasks showcase the superiority of Cream FL over state-of-the-art FL methods. |
| Researcher Affiliation | Academia | Qiying Yu1,4, Yang Liu1,4 , Yimu Wang2, Ke Xu3, Jingjing Liu1 1 Institute for AI Industry Research, Tsinghua University 2 University of Waterloo 3 Carnegie Mellon University 4 Shanghai Artificial Intelligence Laboratory yuqy22@mails.tsinghua.edu.cn, {liuy03,jjliu}@air.tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1: Cream FL algorithm. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code. |
| Open Datasets | Yes | We randomly choose a subset of MS-COCO (Lin et al., 2014) with 50,000 image-text pairs as public dataset. ... We distribute Flicker30K (Plummer et al., 2015) to 15 multimodal clients, CIFAR100 (Krizhevsky et al., 2009) to 10 uni-modal image clients, and AGNEWS (Zhang et al., 2015) to 10 uni-modal text clients... |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly specify a validation dataset split or how it's used for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory used for experiments. |
| Software Dependencies | No | The paper specifies models used (e.g., ResNet-101, BERT) and an optimizer (AdamP) but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We choose Res Net-101 (He et al., 2016) and Res Net-18 as the server and client image models, respectively, and BERT (base) (Devlin et al., 2018) and GRU (Chung et al., 2014) as the text models. The representation dimension d is 512 for both image and text. We use Adam P optimizer with initial learning rate 0.0002 and cosine learning rate scheduler for server model. |