Hierarchical Multi-Label Classification Networks

Authors: Jonatas Wehrmann, Ricardo Cerri, Rodrigo Barros

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate its performance in 21 datasets from four distinct domains, and we compare it against the current HMC state-of-the-art approaches. Results show that HMCN substantially outperforms all baselines with statistical significance, arising as the novel state-of-the-art for HMC.
Researcher Affiliation Academia 1School of Technology, Pontifícia Universidade Católica do Rio Grande do Sul 2Universidade Federal de São Carlos.
Pseudocode No The paper describes the architecture and mathematical formulations, but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes All algorithms are executed over 21 freely-available datasets related to either protein function prediction (Vens et al., 2008), annotation of medical or microalgae images (Dimitrovski et al., 2011), or text classification (Lewis et al., 2004).
Dataset Splits Yes Table 1 presents the characteristics of the employed datasets.
Hardware Specification No The paper mentions: 'We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs that were used for running the experiments.' However, it does not specify the exact GPU model or any other hardware details.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not specify any software libraries (e.g., TensorFlow, PyTorch) or their version numbers.
Experiment Setup Yes For training our networks, we use the Adam optimizer with learning rate of 1 10 3 and remaining parameters as suggested in (Kingma & Ba, 2014). For the HMCNF version, the fully-connected layers comprise 384 Re LU neurons, followed by a batch normalization, residual connections, and dropout of 60%. Dropout is important given that these models could easily overfit the small training sets.