Test-time Collective Prediction

Authors: Celestine Mendler-Dünner, Wenshuo Guo, Stephen Bates, Michael Jordan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the empirical side, we demonstrate the efficacy of our mechanism through extensive numerical experiments across different learning scenarios. In particular, we illustrate the mechanism s advantages over model averaging as well as model selection, and demonstrate that it consistently outperforms alternative non-uniform combination schemes that have access to additional validation data across a wide variety of models and datasets.
Researcher Affiliation Academia Celestine Mendler-Dünner MPI for Intelligent Systems, Tübingen cmendler@tuebingen.mpg.de Wenshuo Guo University of California, Berkeley wguo@cs.berkeley.edu Stephen Bates University of California, Berkeley stephenbates@cs.berkeley.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu
Pseudocode Yes Algorithm 1 De Groot Aggregation
Open Source Code No The paper does not contain an unambiguous statement that the authors are releasing the source code for the work described, nor does it provide a direct link to a code repository.
Open Datasets Yes We work with the abalone dataset [Nash et al., 1994]... Datasets have been downloaded from [Fan, 2011].
Dataset Splits No The paper describes how individual agents use local data for validation (e.g., 'Construct local validation dataset Di(x ) using N-nearest neighbors of x in Di.') and compares against methods using 'additional validation data', but it does not provide specific, global train/validation/test dataset splits (e.g., percentages or exact counts) for the overall experiments needed for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions software like 'scikit-learn' and 'Python' but does not specify version numbers for these or other key software components used in the experiments.
Experiment Setup Yes Unless stated otherwise, we use K = 5 agents and let each agent fit a linear model to her local data... We use N = 5 for local cross-validation in De Groot... For our first experiment... we train a lasso model on each agent with regularization parameter λk = λ that achieves a sparsity of 0.8... We choose N to be 1% of the data partition for all schemes (with a had lower bound at 2).