Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests

Authors: Zachary Daniels, Dimitris Metaxas

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our method on a number of benchmarks against standard and state-of-the-art multi-label classification algorithms with improved results. Finally, we empirically evaluate our method against a number of existing methods on benchmark datasets and discuss some of the strengths and weaknesses of our model.
Researcher Affiliation Academia Zachary A. Daniels, Dimitris N. Metaxas Department of Computer Science Rutgers, The State University of New Jersey zad7@cs.rutgers.edu, dnm@cs.rutgers.edu
Pseudocode No The paper does not contain any clearly labeled or formatted pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code or explicitly state that the code for the described methodology is open-source or available.
Open Datasets Yes We conduct experiments over ten datasets drawn from a wide range of domains: CAL500 (Turnbull et al. 2008), Emotions (Trohidis et al. 2008), Medical (Pestian et al. 2007), the Enron Corpus (Klimt and Yang 2004), Scenes (Boutell et al. 2004), Yeast (Elisseeff and Weston 2001), Corel5k (Duygulu et al. 2002), RCV1 Subsets 1 and 2 (Lewis et al. 2004), TMC2007 (Srivastava and Zane-Ulman 2005), and Mediamill (Snoek et al. 2006). All datasets are provided by the MULAN project (Tsoumakas et al. 2011).
Dataset Splits No For each dataset, we evenly and randomly split the dataset into training and test sets. No explicit mention of a separate validation dataset split.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions software like 'MULAN library implementations' and 'ADAM' but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes We always require tree leaves to have at least three training examples. When building a tree, we always sample 75% of the instances and features without replacement. We perform no pruning. We always use 50 trees in an ensemble. We use ADAM with a learning rate of 0.05 and the suggested parameters (Kingma and Ba 2015). We use batch sizes of min(1000,number of training instances at node).