reproducibilityindex.ai

Greedy Convex Ensemble

Authors: Thanh Tan Nguyen, Nan Ye, Peter Bartlett

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We consider learning a convex combination of basis models, and present some new theoretical and empirical results that demonstrate the effectiveness of a greedy approach. Theoretically, we ﬁrst consider whether we can use linear, instead of convex, combinations, and obtain generalization results similar to existing ones for learning from a convex hull. We obtain a negative result that even the linear hull of very simple basis functions can have unbounded capacity, and is thus prone to overﬁtting; on the other hand, convex hulls are still rich but have bounded capacities. Secondly, we obtain a generalization bound for a general class of Lipschitz loss functions. Empirically, we ﬁrst discuss how a convex combination can be greedily learned with early stopping, and how a convex combination can be non-greedily learned when the number of basis models is known a priori. Our experiments suggest that the greedy scheme is competitive with or better than several baselines, including boosting and random forests. The greedy algorithm requires little effort in hyperparameter tuning, and also seems able to adapt to the underlying complexity of the problem. Our code is available at https://github.com/tan1889/gce.
Researcher Affiliation	Academia	1Queensland University of Technology 2The University of Queensland 3UC Berkeley
Pseudocode	No	The paper describes algorithms (e.g., Frank-Wolfe) in text and equations but does not provide any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/tan1889/gce.
Open Datasets	Yes	Most of the datasets are from UCI ML Repository [Dheeru and Karra Taniskidou, 2017]. ... We used 12 datasets of various sizes and tasks (#instances and dimensions in brackets): diabetes (442/10), boston (506/13), ca_housing (20,640/8), msd (515,345/90) for regression; iris (150/4), wine (178/13), breast_cancer (569/30), digits (1,797/64), cifar10_f (60,000/342), mnist (70,000/784), covertype (581,012/54), kddcup99 (4,898,431/41) for classiﬁcation.
Dataset Splits	Yes	Further details on the datasets, their training/validation/test splits, hyper-parameter tuning and experimental setup are available at https://github.com/tan1889/gce.
Hardware Specification	No	The paper mentions 'GPU speed-up' but does not specify any exact GPU models, CPU models, or other detailed hardware specifications used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'Adam SGD', 'Re LU activation', 'MSE criterion', 'cross entropy loss', but does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup	Yes	GCE uses a ﬁxed set of hyper-parameters without tuning: learning_rate= 0.001, regularization= 0. All these three algorithms use Re LU activation, MSE criterion for training regression problem, cross entropy loss for classiﬁcation. The training uses Adam SGD with learning_rate reduced by 10 on plateau (training performance did not improve for 10 consecutive epochs) until reaching the minimum learning_rate of 10 5, at which point the optimizer is ran for another 10 epochs and then returns the solution with the best performance across all training epochs.