Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition

Authors: Chen Hu, Hanchi Ren, Jingjing Deng, Xianghua Xie, Xiaoke Ma

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate our approach, we assemble two composite datasets: the first combines MNIST and Fashion MNIST; the second comprises RGB datasets of cartoon and human faces, wild animals, marine vessels, and remote sensing images. Our experiments demonstrate that Fission VAE greatly improves generation quality on these datasets compared to baseline federated VAE models.
Researcher Affiliation	Academia	1Swansea University, United Kingdom 2Durham University, United Kingdom 3Xi Dian University, P. R. China EMAIL, EMAIL csvision.swansea.ac.uk
Pseudocode	No	The paper describes the methodology using prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and Suppl. Mat.: github.com/Rand2AI/Fission VAE
Open Datasets	Yes	Mixed MNIST combines MNIST [Le Cun and Cortes, 2010] and Fashion MNIST [Xiao et al., 2017]... CHARM is a more diverse dataset combining five domains: Cartoon faces [Churchill, 2019], Human faces [Karras et al., 2018], Animals [Xian et al., 2019], Remote sensing images [Helber et al., 2019], and Marine vessels [Gundogdu et al., 2016], using preprocessed square images from Meta-Album for Aw A2 and MARVEL.
Dataset Splits	Yes	Mixed MNIST combines MNIST [Le Cun and Cortes, 2010] and Fashion MNIST [Xiao et al., 2017], dividing samples into two client groups (one per dataset) with 10 clients each. Training samples were evenly distributed within each group, and the default test sets served as evaluation benchmarks... Images were resized to 32 × 32, and each domain was represented by 20 clients, with 20,000 images for training and 5,000 for evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Hyperparameters included learning rates of 1 × 10−3 (Mixed MNIST) and 1 × 10−4 (CHARM), with 70 and 500 training rounds, respectively. Clients performed 5 local epochs per round with a batch size of 32. Centralized settings used 70 epochs for Mixed MNIST and 250 for CHARM.