Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means
Authors: Anna van Elst, Igor Colin, Stephan Clémençon
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our theoretical results through experiments on diverse network topologies, data distributions and contamination schemes. |
| Researcher Affiliation | Academia | Anna van Elst Igor Colin Stephan Clémençon LTCI, Télécom Paris, Institut Polytechnique de Paris EMAIL |
| Pseudocode | Yes | Algorithm 1 Go Rank: a synchronous gossip algorithm for ranking. and Algorithm 2 Go Trim: a synchronous gossip algorithm for estimating α-trimmed means. |
| Open Source Code | Yes | The code for our experiments is publicly available.1 (and footnote points to github.com/anna-vanelst/robust-gossip.) |
| Open Datasets | Yes | Experiment (c) uses the Basel Luftklima dataset, corrupted with shift s = 100. This dataset includes temperature measurements from n = 105 sensors across Basel. and The code and dataset for our experiments is publicly available.2 (footnote points to github.com/anna-vanelst/robust-gossip.) |
| Dataset Splits | No | We conduct experiments on a dataset S = {1, . . . , n} with n = 500, distributed across nodes of a communication graph. and the dataset S is contaminated by replacing a fraction ε = 0.1 of the values with outliers. The paper does not specify traditional training/test/validation splits. |
| Hardware Specification | Yes | All experiments were run on a single CPU with 32 GB of memory for 8e4 iterations, with a total execution time of approximately two hours. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Setup. We conduct experiments on a dataset S = {1, . . . , n} with n = 500, distributed across nodes of a communication graph. Our evaluation metric is the normalized absolute error between estimated and true ranks, i.e., for node k at iteration t, the error is defined as ℓk(t) = |Rk(t) rk|/n. ... All experiments were run on a single CPU with 32 GB of memory for 8e4 iterations, with a total execution time of approximately two hours. (Section 3.2) and The experimental setup is identical to that of the previous section, with the key difference being the introduction of corrupted data. Specifically, the dataset S is contaminated by replacing a fraction ε = 0.1 of the values with outliers. We consider two types of corruption, each affecting εn randomly selected data points: (a) scaling, where a value x is changed to sx, and (b) shifting, where x becomes x + s. (Section 4.3) and Figures (b) and (c) demonstrate that, for α = 0.2 and ε = 1, GOTRIM quickly improves on the naive corrupted mean and converges to the trimmed mean. (Figure 3 caption). |