Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means

Authors: Anna van Elst, Igor Colin, Stephan Clémençon

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our theoretical results through experiments on diverse network topologies, data distributions and contamination schemes.
Researcher Affiliation	Academia	Anna van Elst Igor Colin Stephan Clémençon LTCI, Télécom Paris, Institut Polytechnique de Paris EMAIL
Pseudocode	Yes	Algorithm 1 Go Rank: a synchronous gossip algorithm for ranking. and Algorithm 2 Go Trim: a synchronous gossip algorithm for estimating α-trimmed means.
Open Source Code	Yes	The code for our experiments is publicly available.1 (and footnote points to github.com/anna-vanelst/robust-gossip.)
Open Datasets	Yes	Experiment (c) uses the Basel Luftklima dataset, corrupted with shift s = 100. This dataset includes temperature measurements from n = 105 sensors across Basel. and The code and dataset for our experiments is publicly available.2 (footnote points to github.com/anna-vanelst/robust-gossip.)
Dataset Splits	No	We conduct experiments on a dataset S = {1, . . . , n} with n = 500, distributed across nodes of a communication graph. and the dataset S is contaminated by replacing a fraction ε = 0.1 of the values with outliers. The paper does not specify traditional training/test/validation splits.
Hardware Specification	Yes	All experiments were run on a single CPU with 32 GB of memory for 8e4 iterations, with a total execution time of approximately two hours.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers.
Experiment Setup	Yes	Setup. We conduct experiments on a dataset S = {1, . . . , n} with n = 500, distributed across nodes of a communication graph. Our evaluation metric is the normalized absolute error between estimated and true ranks, i.e., for node k at iteration t, the error is defined as ℓk(t) = \|Rk(t) rk\|/n. ... All experiments were run on a single CPU with 32 GB of memory for 8e4 iterations, with a total execution time of approximately two hours. (Section 3.2) and The experimental setup is identical to that of the previous section, with the key difference being the introduction of corrupted data. Specifically, the dataset S is contaminated by replacing a fraction ε = 0.1 of the values with outliers. We consider two types of corruption, each affecting εn randomly selected data points: (a) scaling, where a value x is changed to sx, and (b) shifting, where x becomes x + s. (Section 4.3) and Figures (b) and (c) demonstrate that, for α = 0.2 and ε = 1, GOTRIM quickly improves on the naive corrupted mean and converges to the trimmed mean. (Figure 3 caption).