Hyperbolic Embedding Inference for Structured Multi-Label Prediction

Authors: Bo Xiong, Michael Cochez, Mojtaba Nayyeri, Steffen Staab

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 12 datasets show 1) significant improvements in mean average precision; 2) lower number of constraint violations; 3) an order of magnitude fewer dimensions than baselines.
Researcher Affiliation Collaboration Bo Xiong University of Stuttgart Stuttgart, Germany Michael Cochez Vrije Universiteit Amsterdam Discovery Lab, Elsevier Amsterdam, The Netherlands Mojtaba Nayyeri University of Stuttgart Stuttgart, Germany Steffen Staab University of Stuttgart University of Southampton Stuttgart, Germany
Pseudocode No The paper describes algorithms and functions in text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is open available at 5. 5https://github.com/xiongbo010/HMI
Open Datasets Yes We consider 12 datasets that have been used for evaluating multi-label prediction methods [11, 8, 10]. These consist of 8 functional genomic datasets [28], 3 image annotation datasets [29, 30], and 1 text classification dataset [31].
Dataset Splits Yes Similar to MBM [11] and its baselines, we sample 30% of the implications and exclusions constraints for training the model. We employ an early-stopping strategy with patience 20 to save training time.
Hardware Specification Yes We train the models on NVIDIA A100 with 40GB memory.
Software Dependencies No We implement HMI, HLR and HMC-HLR using PyTorch [34] and train the models on NVIDIA A100 with 40GB memory. We train HMI, HLR and HMI+HLR using Riemannian Adam [35] optimizer implemented by the Geoopt library [36].
Experiment Setup Yes We train HMI, HLR and HMI+HLR using Riemannian Adam [35] optimizer implemented by the Geoopt library [36] with a batch size of 4. We set the dropout rate to 0.6 suggested by [14] to avoid the case that the model overfits the small training sets. We employ an early-stopping strategy with patience 20 to save training time. The learning rate is searched from {1e-4, 5e-4, 1e-3, 5e-3, 1e-2}. The penalty weight of the violation is searched from {1e-5, 5e-4, 1e-4, 5e-3, 1e-2} and we also show its impact in an ablation. The best dimension per dataset is searched from {32, 64, 128, 256}.