reproducibilityindex.ai

Language Models as Hierarchy Encoders

Authors: Yuan He, Moy Yuan, Jiaoyan Chen, Ian Horrocks

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate HITs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies. The results demonstrate that HITs consistently outperform all baselines in these tasks, underscoring the effectiveness and transferability of our re-trained hierarchy encoders. In evaluating HITs, we compare their performance against pre-trained LMs, standard fine-tuned LMs, and previous hyperbolic embedding models in the Multi-hop Inference and Mixed-hop Prediction tasks. Our experiments utilise datasets derived from Word Net [14] and SNOMED CT [15],4 and transfer evaluation datasets from Schema.org [16], Food Ontology (Food On) [17], and Disease Ontology (DOID) [18]. Table 2: Multi-hop Inference and Mixed-hop Prediction test results on Word Net.
Researcher Affiliation	Academia	Yuan He University of Oxford yuan.he@cs.ox.ac.uk Zhangdie Yuan University of Cambridge zy317@cam.ac.uk Jiaoyan Chen The University of Manchester jiaoyan.chen@mancheser.ac.uk Ian Horrocks University of Oxford ian.horrocks@cs.ox.ac.uk
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	1See Git Hub repository: https://github.com/KRR-Oxford/Hierarchy Transformers; Datasets on Zenodo: https://zenodo.org/doi/10.5281/zenodo.10511042 or the Huggingface Hub: https: //huggingface.co/Hierarchy-Transformers; and HIT models also on the Huggingface Hub.
Open Datasets	Yes	1See Git Hub repository: https://github.com/KRR-Oxford/Hierarchy Transformers; Datasets on Zenodo: https://zenodo.org/doi/10.5281/zenodo.10511042 or the Huggingface Hub: https: //huggingface.co/Hierarchy-Transformers; and HIT models also on the Huggingface Hub. Our experiments utilise datasets derived from Word Net [14] and SNOMED CT [15],4 and transfer evaluation datasets from Schema.org [16], Food Ontology (Food On) [17], and Disease Ontology (DOID) [18].
Dataset Splits	Yes	Multi-hop Inference This task, following the setting in [13], aims to evaluate the model s ability in deducing indirect, multi-hop subsumptions T from direct, one-hop subsumptions E, so as to simulate transitive inference. We split T for validation and testing, denoted as Tval and Ttest, respectively. ... Mixed-hop Prediction This task aims to evaluate the model s capability in determining the existence of subsumption relationships between arbitrary entity pairs, where the entities are not necessarily seen during training. We propose a challenging setting where models are trained on incomplete direct subsumptions and examined on a mix of hold-out, unseen direct and indirect (mixed-hop) subsumptions. We split E into training, validation, and test sets, denoted as Etrain, Eval, and Etest, respectively. The final training, validataion, and test sets for this task are Etrain, Eval Tval, and Etest Ttest, respectively, where Tval and Ttest are re-used from the previous task. ... On Word Net (Noun), Food On, and DOID, we adopt a consistent splitting ratio for the validation and testing sets. Specifically, we allocate two separate 5% portions of the indirect subsumptions T to form Tval and Ttest, respectively. Similarly, two distinct 5% portions of the direct subsumptions E are used as Eval and Etest. Table 1 presents the extracted hierarchies statistics and the resulting datasets for the Multi-hop Inference and Mixed-hop Prediction tasks.
Hardware Specification	Yes	All our experiments were conducted on a single Quadro RTX 8000 GPU.
Software Dependencies	No	The code implementation of this work primarily depends on Deep Onto [34] for processing hierarchies and constructing datasets, Geoopt [35] for Poincaré ball, Sentence-Transformers [21] and Huggingface-Transformers [36] for training and evaluation of LMs.
Experiment Setup	Yes	In the hierarchy re-training of our HIT models, we configured the hyperbolic clustering loss margin (α in Equation 3) at 5.0 and the hyperbolic centripetal loss margin (β in Equation 4) at 0.1. An exception was made for all-mpnet-base-v2 with hard negatives, where α was adjusted to 3.0, based on validation. The models were trained for 20 epochs, with a training batch size of 256, 500 warm-up steps and an initial learning rate of 10 5, using the Adam W optimiser [37].