Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimal Estimator for Unlabeled Linear Regression
Authors: Hang Zhang, Ping Li
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are also provided to corroborate the theoretical claims. 5. Simulations This section presents the numerical results. Since our estimator cannot guarantee the correct permutation matrix Π under the single observation model, our simulations focus on the multiple observations model, i.e., m > 1. |
| Researcher Affiliation | Industry | Hang Zhang, Ping Li Cognitive Computing Lab Baidu Research 10900 NE 8th ST. Bellevue, WA 98004, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 The one-step estimator proposed in this paper. Input: observation Y and sensing matrix X. Output: pair ( !Π, !B), which is written as !Π = argmaxΠ Pn !B = (X) !Π where X = (X X) 1X is the pseudo-inverse of X and Pn is the set of all possible permutation matrices. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes generating synthetic data for simulations: 'We closely follow the experiment setting in Zhang et al. (2019b). We set the i-th column B :,i (1 i min(m, p)) to be the i-th canonical basis, which has 1 on the i-th entry and 0 elsewhere.' It does not refer to a publicly available dataset for training. |
| Dataset Splits | No | The paper describes simulation experiments but does not provide explicit train/validation/test dataset splits or cross-validation details. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | For each n, we choose p {0.1n, 0.2n} and h {n/10, n/4}. That is, when n = 500, we have p {50, 100} and h {50, 125}; and when n = 1000, we have p {100, 200} and h {100, 250}. For each chosen set of parameters (n, p, m, h) and SNR value, we simulate the data 1000 times and report the success rate of exact recovery of Π using our proposed estimator in Algorithm 1. |