reproducibilityindex.ai

Test-time Collective Prediction

Authors: Celestine Mendler-Dünner, Wenshuo Guo, Stephen Bates, Michael Jordan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the empirical side, we demonstrate the efﬁcacy of our mechanism through extensive numerical experiments across different learning scenarios. In particular, we illustrate the mechanism s advantages over model averaging as well as model selection, and demonstrate that it consistently outperforms alternative non-uniform combination schemes that have access to additional validation data across a wide variety of models and datasets.
Researcher Affiliation	Academia	Celestine Mendler-Dünner MPI for Intelligent Systems, Tübingen cmendler@tuebingen.mpg.de Wenshuo Guo University of California, Berkeley wguo@cs.berkeley.edu Stephen Bates University of California, Berkeley stephenbates@cs.berkeley.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu
Pseudocode	Yes	Algorithm 1 De Groot Aggregation
Open Source Code	No	The paper does not contain an unambiguous statement that the authors are releasing the source code for the work described, nor does it provide a direct link to a code repository.
Open Datasets	Yes	We work with the abalone dataset [Nash et al., 1994]... Datasets have been downloaded from [Fan, 2011].
Dataset Splits	No	The paper describes how individual agents use local data for validation (e.g., 'Construct local validation dataset Di(x ) using N-nearest neighbors of x in Di.') and compares against methods using 'additional validation data', but it does not provide specific, global train/validation/test dataset splits (e.g., percentages or exact counts) for the overall experiments needed for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions software like 'scikit-learn' and 'Python' but does not specify version numbers for these or other key software components used in the experiments.
Experiment Setup	Yes	Unless stated otherwise, we use K = 5 agents and let each agent ﬁt a linear model to her local data... We use N = 5 for local cross-validation in De Groot... For our ﬁrst experiment... we train a lasso model on each agent with regularization parameter λk = λ that achieves a sparsity of 0.8... We choose N to be 1% of the data partition for all schemes (with a had lower bound at 2).