Greedy Convex Ensemble

Authors: Thanh Tan Nguyen, Nan Ye, Peter Bartlett

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We consider learning a convex combination of basis models, and present some new theoretical and empirical results that demonstrate the effectiveness of a greedy approach. Theoretically, we first consider whether we can use linear, instead of convex, combinations, and obtain generalization results similar to existing ones for learning from a convex hull. We obtain a negative result that even the linear hull of very simple basis functions can have unbounded capacity, and is thus prone to overfitting; on the other hand, convex hulls are still rich but have bounded capacities. Secondly, we obtain a generalization bound for a general class of Lipschitz loss functions. Empirically, we first discuss how a convex combination can be greedily learned with early stopping, and how a convex combination can be non-greedily learned when the number of basis models is known a priori. Our experiments suggest that the greedy scheme is competitive with or better than several baselines, including boosting and random forests. The greedy algorithm requires little effort in hyperparameter tuning, and also seems able to adapt to the underlying complexity of the problem. Our code is available at https://github.com/tan1889/gce.
Researcher Affiliation Academia 1Queensland University of Technology 2The University of Queensland 3UC Berkeley
Pseudocode No The paper describes algorithms (e.g., Frank-Wolfe) in text and equations but does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/tan1889/gce.
Open Datasets Yes Most of the datasets are from UCI ML Repository [Dheeru and Karra Taniskidou, 2017]. ... We used 12 datasets of various sizes and tasks (#instances and dimensions in brackets): diabetes (442/10), boston (506/13), ca_housing (20,640/8), msd (515,345/90) for regression; iris (150/4), wine (178/13), breast_cancer (569/30), digits (1,797/64), cifar10_f (60,000/342), mnist (70,000/784), covertype (581,012/54), kddcup99 (4,898,431/41) for classification.
Dataset Splits Yes Further details on the datasets, their training/validation/test splits, hyper-parameter tuning and experimental setup are available at https://github.com/tan1889/gce.
Hardware Specification No The paper mentions 'GPU speed-up' but does not specify any exact GPU models, CPU models, or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper mentions software components like 'Adam SGD', 'Re LU activation', 'MSE criterion', 'cross entropy loss', but does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup Yes GCE uses a fixed set of hyper-parameters without tuning: learning_rate= 0.001, regularization= 0. All these three algorithms use Re LU activation, MSE criterion for training regression problem, cross entropy loss for classification. The training uses Adam SGD with learning_rate reduced by 10 on plateau (training performance did not improve for 10 consecutive epochs) until reaching the minimum learning_rate of 10 5, at which point the optimizer is ran for another 10 epochs and then returns the solution with the best performance across all training epochs.