Language Models as Hierarchy Encoders
Authors: Yuan He, Moy Yuan, Jiaoyan Chen, Ian Horrocks
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate HITs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies. The results demonstrate that HITs consistently outperform all baselines in these tasks, underscoring the effectiveness and transferability of our re-trained hierarchy encoders. In evaluating HITs, we compare their performance against pre-trained LMs, standard fine-tuned LMs, and previous hyperbolic embedding models in the Multi-hop Inference and Mixed-hop Prediction tasks. Our experiments utilise datasets derived from Word Net [14] and SNOMED CT [15],4 and transfer evaluation datasets from Schema.org [16], Food Ontology (Food On) [17], and Disease Ontology (DOID) [18]. Table 2: Multi-hop Inference and Mixed-hop Prediction test results on Word Net. |
| Researcher Affiliation | Academia | Yuan He University of Oxford yuan.he@cs.ox.ac.uk Zhangdie Yuan University of Cambridge zy317@cam.ac.uk Jiaoyan Chen The University of Manchester jiaoyan.chen@mancheser.ac.uk Ian Horrocks University of Oxford ian.horrocks@cs.ox.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1See Git Hub repository: https://github.com/KRR-Oxford/Hierarchy Transformers; Datasets on Zenodo: https://zenodo.org/doi/10.5281/zenodo.10511042 or the Huggingface Hub: https: //huggingface.co/Hierarchy-Transformers; and HIT models also on the Huggingface Hub. |
| Open Datasets | Yes | 1See Git Hub repository: https://github.com/KRR-Oxford/Hierarchy Transformers; Datasets on Zenodo: https://zenodo.org/doi/10.5281/zenodo.10511042 or the Huggingface Hub: https: //huggingface.co/Hierarchy-Transformers; and HIT models also on the Huggingface Hub. Our experiments utilise datasets derived from Word Net [14] and SNOMED CT [15],4 and transfer evaluation datasets from Schema.org [16], Food Ontology (Food On) [17], and Disease Ontology (DOID) [18]. |
| Dataset Splits | Yes | Multi-hop Inference This task, following the setting in [13], aims to evaluate the model s ability in deducing indirect, multi-hop subsumptions T from direct, one-hop subsumptions E, so as to simulate transitive inference. We split T for validation and testing, denoted as Tval and Ttest, respectively. ... Mixed-hop Prediction This task aims to evaluate the model s capability in determining the existence of subsumption relationships between arbitrary entity pairs, where the entities are not necessarily seen during training. We propose a challenging setting where models are trained on incomplete direct subsumptions and examined on a mix of hold-out, unseen direct and indirect (mixed-hop) subsumptions. We split E into training, validation, and test sets, denoted as Etrain, Eval, and Etest, respectively. The final training, validataion, and test sets for this task are Etrain, Eval Tval, and Etest Ttest, respectively, where Tval and Ttest are re-used from the previous task. ... On Word Net (Noun), Food On, and DOID, we adopt a consistent splitting ratio for the validation and testing sets. Specifically, we allocate two separate 5% portions of the indirect subsumptions T to form Tval and Ttest, respectively. Similarly, two distinct 5% portions of the direct subsumptions E are used as Eval and Etest. Table 1 presents the extracted hierarchies statistics and the resulting datasets for the Multi-hop Inference and Mixed-hop Prediction tasks. |
| Hardware Specification | Yes | All our experiments were conducted on a single Quadro RTX 8000 GPU. |
| Software Dependencies | No | The code implementation of this work primarily depends on Deep Onto [34] for processing hierarchies and constructing datasets, Geoopt [35] for Poincaré ball, Sentence-Transformers [21] and Huggingface-Transformers [36] for training and evaluation of LMs. |
| Experiment Setup | Yes | In the hierarchy re-training of our HIT models, we configured the hyperbolic clustering loss margin (α in Equation 3) at 5.0 and the hyperbolic centripetal loss margin (β in Equation 4) at 0.1. An exception was made for all-mpnet-base-v2 with hard negatives, where α was adjusted to 3.0, based on validation. The models were trained for 20 epochs, with a training batch size of 256, 500 warm-up steps and an initial learning rate of 10 5, using the Adam W optimiser [37]. |