Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

Authors: Yuxin Wen, Jonas A. Geiping, Liam Fowl, Micah Goldblum, Tom Goldstein

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the strategy in challenging large-scale settings, obtaining high-fidelity data extraction in both cross-device and cross-silo federated learning. Code is available at https://github. com/Jonas Geiping/breaching. We then verify in a range of experiments for the task of image classification that this attack allows us to leverage existing optimization-based (Geiping et al., 2020) and analytic attacks (Lu et al., 2021), which currently only work well for inverting an updated calculated on a single or few data points.
Researcher Affiliation	Academia	1University of Maryland 2New York University.
Pseudocode	Yes	Detailed implementation can be found in Algorithm 1.
Open Source Code	Yes	Code is available at https://github. com/Jonas Geiping/breaching.
Open Datasets	Yes	All images are from Image Net ILSVRC 2012 (Russakovsky et al., 2015) with a size of 224 × 224 and include 1000 classes in total.
Dataset Splits	Yes	For our quantitative experiments we partition the Image Net validation set into 100 users with the given batch size, and either allocate each user a different class or assign images to users at random (without replacement).
Hardware Specification	Yes	We run all the optimization-based attacks using single 2080ti GPUs and run the analytic attacks via APRIL on CPUs, solving the embedding layer and attention inversion under-determined problems via an SVD solver (dgelss).
Software Dependencies	No	The paper mentions 'We implement these attacks in a Py Torch framework (Paszke et al., 2017)' but does not provide a specific version number for PyTorch or any other software library.
Experiment Setup	Yes	We use α = 1000 for the class fishing strategy and θ = 1000 for the feature fishing strategy... We apply both strategies to the last linear layer of a pre-trained Res Net-18 for all experiments except Section 4.4. In the optimization, we use Adam with step size 0.1 and 50 iterations of warmup over total 24K (Yin et al., 2021). The initialization is set to the pattern tiling of 4 × 4 random normal data introduced in (Wei et al., 2020).