Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Minimax Rates in Permutation Estimation for Feature Matching

Authors: Olivier Collier, Arnak S. Dalalyan

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also discuss the computational aspects of the estimators and provide empirical evidence of their consistency on synthetic data. We carried out a small experimental evaluation that conﬁrms that in the heteroscedastic setting the LSL estimator is as good as the LSNS (pseudo-) estimator and that they outperform the two other estimators: the greedy estimator and the least sum of squares. We have implemented all the procedures in Matlab and carried out numerical experiments on synthetic data.
Researcher Affiliation	Academia	Olivier Collier EMAIL Imagine LIGM Université Paris EST Marne-la-Vallée, FRANCE; Arnak S. Dalalyan EMAIL Laboratoire de Statistique ENSAE CREST Malakoﬀ, FRANCE
Pseudocode	No	The paper describes the estimation procedures using mathematical formulas (equations 8-12) and textual explanations, but it does not include any explicitly labeled pseudocode blocks or algorithms in a structured, code-like format.
Open Source Code	No	The paper states, "We have implemented all the procedures in Matlab and carried out numerical experiments on synthetic data," indicating that code was written for the experiments. However, it does not provide any specific links to a code repository, an explicit statement of code release, or mention of code in supplementary materials.
Open Datasets	No	We have implemented all the procedures in Matlab and carried out numerical experiments on synthetic data. We chose n = d = 200 and randomly generated a n d matrix θ with i.i.d. entries uniformly distributed on [0, τ], with several values of τ varying between 1.4 and 3.5. Then, we randomly chose a permutation π (uniformly from Sn) and generated the sets {Xi} and {X# i } according to (2) with σi = σ# i = 1.
Dataset Splits	No	The paper uses synthetic data generated for each trial (e.g., "averaged over 500 independent trials"). It describes the parameters for generating this data but does not mention partitioning a fixed dataset into training, validation, or test sets in the conventional sense of dataset splits.
Hardware Specification	No	for a problem with n = 500 features, it takes about six seconds to compute a solution to (17) on a standard PC.
Software Dependencies	Yes	To simplify, we have used the general-purpose solver Se Du Mi (Sturm, 1999) for solving linear programs. Jos F. Sturm. Using Se Du Mi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw., 11/12(1-4):625 653, 1999.
Experiment Setup	Yes	We chose n = d = 200 and randomly generated a n d matrix θ with i.i.d. entries uniformly distributed on [0, τ], with several values of τ varying between 1.4 and 3.5. Then, we randomly chose a permutation π (uniformly from Sn) and generated the sets {Xi} and {X# i } according to (2) with σi = σ# i = 1. The result, averaged over 500 independent trials.