Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cross-client Label Propagation for Transductive and Semi-Supervised Federated Learning

Authors: Jonathan Scott, Michelle Yeo, Christoph H Lampert

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both real federated and standard benchmark datasets show that in both applications XCLP achieves higher classiﬁcation accuracy than alternative approaches.
Researcher Affiliation	Academia	Jonathan Scott EMAIL ISTA (Institute of Science and Technology Austria), Klosterneuburg, Austria Michelle Yeo EMAIL ISTA (Institute of Science and Technology Austria), Klosterneuburg, Austria Christoph H. Lampert EMAIL ISTA (Institute of Science and Technology Austria), Klosterneuburg, Austria
Pseudocode	Yes	Algorithm 1: Cross-Client Label Propagation
Open Source Code	Yes	Source code for our experiments can be found at https://github.com/jonnyascott/xclp.
Open Datasets	Yes	We use the Fed-ISIC2019 dataset (Ogier du Terrail et al., 2022), a real-world benchmark for federated classiﬁcation of medical images. We use three standard datasets: CIFAR-10 (Krizhevsky, 2009)... as well as the more diﬃcult CIFAR-100 (Krizhevsky, 2009) and Mini-Image Net (Vinyals et al., 2016). We use the FEMNIST Caldas et al. (2018) dataset.
Dataset Splits	Yes	All three datasets consist of 60,000 images which we split into training sets of size n := 50,000 and test sets of size 10,000. From the training set, n L examples are labeled and the remaining n n L are unlabeled. For CIFAR-10 we evaluate with n L = 1,000 and 5,000. For CIFAR-100 and Mini-Image Net we take n L = 5,000 and 10,000. Federated Setup We simulate a FL scenario by splitting the training data (labeled and unlabeled) between m clients. m L of these have partly labeled data, while the others have only unlabeled data. Each client is assigned a total of n/m data points of which n L/m L are labeled if the client is one of the m L which possess labels. We simulate statistical heterogeneity among the clients by controlling the number of classes each client has access to. In the i.i.d. setting all clients have uniform class distributions and receive an equal number of labels of each class. In the non-i.i.d. setting we assign a class distribution to each client and clients receive labels according to their own distribution.
Hardware Specification	No	The paper does not provide specific hardware details. It mentions a "simulated setting of federated learning" but does not specify the computing resources used for the simulations or experiments.
Software Dependencies	No	The paper mentions models like EfficientNet, 13-layer CNNs, and ResNet-18, and optimizers like SGD, but does not specify software libraries or their version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	Hyper-parameters We set all hyper-parameters for Federated Averaging to the values speciﬁed in Ogier du Terrail et al. (2022) except we increase the number of training rounds to T = 40 as we found that the accuracy to improve with further training. Parameters for XCLP (LSH dimension, k-NN parameter) are chosen using cross-validation. We use L = 1024 and k = 3. We ﬁx the parameter α = 0.99. Federated learning parameters We set the number of clients to m = 100... The Client Update step corresponds to E epochs of stochastic gradient descent (SGD) of a loss function. We set the number local epochs to E = 5 and the loss function is (per sample weighted) cross-entropy loss. The number of training rounds is set to T = 1500 and the number of clients sampled by the server per training round is set to 5... We use weight decay for all network parameters which is set to 2e-4. Learning rate for SGD is set according to this batch size. On CIFAR-10, for \|BL\| < 50 we set the learning rate to 0.1 and for \|BL\| = 50 we set the learning rate to 0.3. On CIFAR-100 and Mini-Image Net we always have \|BL\| = 50 and we set the learning rates to 0.5 and 1.0 respectively. We decay the learning rate using cosine annealing so that the learning rate would be 0 after 2000 rounds. XCLP parameters We set the LSH dimension to L = 4096 as this gave near exact approximation of the cosine similarities while still being computationally fast (less than 1 second per round). We set the sparsiﬁcation parameter to k = 10, so that each point is connected to its 10 most similar neighbors in the graph, and the label propagation parameter to α = 0.99.