Bagging by Design (on the Suboptimality of Bagging)

Authors: Periklis Papakonstantinou, Jia Xu, Zhu Cao

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our analytical results are backed up by experiments on classification and regression settings. Empirical results We provide empirical evidence (i) supporting the covariance assumption and (ii) comparing bagging to design-bagging in classification and regression settings.
Researcher Affiliation Academia Periklis A. Papakonstantinou and Jia Xu and Zhu Cao IIIS, Tsinghua University
Pseudocode Yes Algorithm 1 Blocks Generating Algorithm (BGA) Input: block size b, number of blocks m, universe size N Initialize m empty blocks. for i = 1 to b m do choose L at random from the set of blocks with current min # of elements S : set of elements in the universe not in L that appear least frequently L L[{e}, where e 2 S chosen uniformly at random end for Output: m blocks each with b distinct elements.
Open Source Code No The paper does not provide an explicit statement about releasing source code, nor does it include a link to a code repository.
Open Datasets Yes Our study is on various base learners and data sets from the UCI repository and on a real data set MNIST. The details of the data sets can be obtained from the UCI repository: number of samples, features as well as classes can be found in the full version.
Dataset Splits Yes The test set is 10% of the whole data set uniform randomly selected and the rest samples are taken as the training set for each task. On Fisher s Iris data, we applied a 10-fold crossvalidation to evaluate a binary classification task...
Hardware Specification No The paper does not provide any specific hardware details such as CPU or GPU models used for running the experiments.
Software Dependencies No The paper mentions software used (e.g., 'SVM in Matlab' and 'Decision Tree C4.5') but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes Bagging and design bagging are performed on 30 bootstraps (m = 30) and combined based on voting, and we set the number of samples in each bootstrap to N/2 (cf. (B uhlmann and Yu 2002; Friedman and Hall 2007) justifying this choice), where N is the number of training samples. We repeated the same experiment 1000 times for all classification tasks to remove the random noise. For polynomial regression we repeated for 450K times.