Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning

Authors: Alex Chan, Mihaela van der Schaar

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we will use a series of experiments to make the following concrete points about our method: 1) In common scenarios, global ensembles do not work, and we must make instancewise predictions (Section 4.1); 2) Doing so in even slightly high dimensions requires a representation learning step, and our proposed losses improves the quality of the learnt representation (Section 4.2); 3) We can make good predictions with surprisingly little information (Section 4.2); 4) This is useful beyond synthetic example setups in real-world case scenarios (Section 4.3); 5) Naturally, there are setups where we will underperform, which we should understand (Section 4.4); and 6) We are completely agnostic to the type of models used (Section 4 we demonstrate across a variety of models from simple regressions to convolutional nets and differential equations).
Researcher Affiliation Academia Alex J. Chan University of Cambridge Cambridge, UK ajc340cam.ac.uk Mihaela van der Schaar University of Cambridge Cambridge Centre for AI in Medicine Cambridge, UK mv472@cam.ac.uk
Pseudocode Yes Algorithm 1: Synthetic Model Combination Result: Test predictions using mapping from data to model weights Input: {(Mj, Ij)}N j=1 and DT ; 1. Use information to produce density models; 2. Sample data from models and combine with test data; 3. Learn representation space; 4. Re-model densities in new space; 5. Calculate weights in new space; 6. Make predictions {ˆyi}M i=1 over test set; Return: {ˆyi}M i=1
Open Source Code Yes Finally, we provide practical demonstrations of both the success and failure cases of traditional methods and SMC, in synthetic examples as well as a real example of precision dosing code for which is made available at https://github.com/ Xander JC/synthetic-model-combination, along with the larger lab group codebase at https://github.com/vanderschaarlab/ synthetic-model-combination.
Open Datasets Yes Using MNIST (d = 784), we construct a problem where ten different classifiers are trained to each individually identify a single digit effectively while their performances on other digits are significantly lower this is achieved by providing mostly only data of the respective single digit. and We use simulated patients provided by the authors to evaluate the effectiveness of SMC in the accuracy of predicting the AUC across a number of settings when a number {0 (A priori), 1, 2, 3, 4} of concentration measurements are taken in a 36-hour period.
Dataset Splits No The paper mentions 'training domain' and 'test set' and discusses 'validation data' in relation to Bayesian Model Averaging (BMA) methods, but it does not provide specific percentages or counts for training, validation, or test splits for its own experiments, nor does it specify how the test set was created from the main dataset.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper refers to 'Pytorch' and 'TDMx software' in the references and text, respectively, but does not provide specific version numbers for any software dependencies or libraries used in their experimental setup.
Experiment Setup No The paper describes aspects of the experimental setup, such as the training data distributions for the regression example (e.g., 'Gaussian centred at 5, with the other centred at 15, and standard deviation 3.5') and the training approach for MNIST (e.g., 'ten different classifiers are trained'), along with the features used in the vancomycin study. However, it does not provide specific numerical hyperparameters like learning rates, batch sizes, number of epochs, or optimizer settings for the neural networks or other models used in the experiments.