Bagging by Design (on the Suboptimality of Bagging)
Authors: Periklis Papakonstantinou, Jia Xu, Zhu Cao
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analytical results are backed up by experiments on classification and regression settings. Empirical results We provide empirical evidence (i) supporting the covariance assumption and (ii) comparing bagging to design-bagging in classification and regression settings. |
| Researcher Affiliation | Academia | Periklis A. Papakonstantinou and Jia Xu and Zhu Cao IIIS, Tsinghua University |
| Pseudocode | Yes | Algorithm 1 Blocks Generating Algorithm (BGA) Input: block size b, number of blocks m, universe size N Initialize m empty blocks. for i = 1 to b m do choose L at random from the set of blocks with current min # of elements S : set of elements in the universe not in L that appear least frequently L L[{e}, where e 2 S chosen uniformly at random end for Output: m blocks each with b distinct elements. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code, nor does it include a link to a code repository. |
| Open Datasets | Yes | Our study is on various base learners and data sets from the UCI repository and on a real data set MNIST. The details of the data sets can be obtained from the UCI repository: number of samples, features as well as classes can be found in the full version. |
| Dataset Splits | Yes | The test set is 10% of the whole data set uniform randomly selected and the rest samples are taken as the training set for each task. On Fisher s Iris data, we applied a 10-fold crossvalidation to evaluate a binary classification task... |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU or GPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions software used (e.g., 'SVM in Matlab' and 'Decision Tree C4.5') but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Bagging and design bagging are performed on 30 bootstraps (m = 30) and combined based on voting, and we set the number of samples in each bootstrap to N/2 (cf. (B uhlmann and Yu 2002; Friedman and Hall 2007) justifying this choice), where N is the number of training samples. We repeated the same experiment 1000 times for all classification tasks to remove the random noise. For polynomial regression we repeated for 450K times. |