Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests
Authors: Zachary Daniels, Dimitris Metaxas
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our method on a number of benchmarks against standard and state-of-the-art multi-label classification algorithms with improved results. Finally, we empirically evaluate our method against a number of existing methods on benchmark datasets and discuss some of the strengths and weaknesses of our model. |
| Researcher Affiliation | Academia | Zachary A. Daniels, Dimitris N. Metaxas Department of Computer Science Rutgers, The State University of New Jersey zad7@cs.rutgers.edu, dnm@cs.rutgers.edu |
| Pseudocode | No | The paper does not contain any clearly labeled or formatted pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code or explicitly state that the code for the described methodology is open-source or available. |
| Open Datasets | Yes | We conduct experiments over ten datasets drawn from a wide range of domains: CAL500 (Turnbull et al. 2008), Emotions (Trohidis et al. 2008), Medical (Pestian et al. 2007), the Enron Corpus (Klimt and Yang 2004), Scenes (Boutell et al. 2004), Yeast (Elisseeff and Weston 2001), Corel5k (Duygulu et al. 2002), RCV1 Subsets 1 and 2 (Lewis et al. 2004), TMC2007 (Srivastava and Zane-Ulman 2005), and Mediamill (Snoek et al. 2006). All datasets are provided by the MULAN project (Tsoumakas et al. 2011). |
| Dataset Splits | No | For each dataset, we evenly and randomly split the dataset into training and test sets. No explicit mention of a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'MULAN library implementations' and 'ADAM' but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We always require tree leaves to have at least three training examples. When building a tree, we always sample 75% of the instances and features without replacement. We perform no pruning. We always use 50 trees in an ensemble. We use ADAM with a learning rate of 0.05 and the suggested parameters (Kingma and Ba 2015). We use batch sizes of min(1000,number of training instances at node). |