Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Test-time Collective Prediction
Authors: Celestine Mendler-Dünner, Wenshuo Guo, Stephen Bates, Michael Jordan
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the empirical side, we demonstrate the efficacy of our mechanism through extensive numerical experiments across different learning scenarios. In particular, we illustrate the mechanism s advantages over model averaging as well as model selection, and demonstrate that it consistently outperforms alternative non-uniform combination schemes that have access to additional validation data across a wide variety of models and datasets. |
| Researcher Affiliation | Academia | Celestine Mendler-Dünner MPI for Intelligent Systems, Tübingen EMAIL Wenshuo Guo University of California, Berkeley EMAIL Stephen Bates University of California, Berkeley EMAIL Michael I. Jordan University of California, Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 De Groot Aggregation |
| Open Source Code | No | The paper does not contain an unambiguous statement that the authors are releasing the source code for the work described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We work with the abalone dataset [Nash et al., 1994]... Datasets have been downloaded from [Fan, 2011]. |
| Dataset Splits | No | The paper describes how individual agents use local data for validation (e.g., 'Construct local validation dataset Di(x ) using N-nearest neighbors of x in Di.') and compares against methods using 'additional validation data', but it does not provide specific, global train/validation/test dataset splits (e.g., percentages or exact counts) for the overall experiments needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions software like 'scikit-learn' and 'Python' but does not specify version numbers for these or other key software components used in the experiments. |
| Experiment Setup | Yes | Unless stated otherwise, we use K = 5 agents and let each agent fit a linear model to her local data... We use N = 5 for local cross-validation in De Groot... For our first experiment... we train a lasso model on each agent with regularization parameter λk = λ that achieves a sparsity of 0.8... We choose N to be 1% of the data partition for all schemes (with a had lower bound at 2). |