Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective

Authors: Yang Chen, Cong Fang, Zhouchen Lin, Bing Liu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to back up the validity of our hypergraph formulation for relational learning in PTMs. In the first experiment of synthetic relational learning, we create synthetic entities whose relations compose weighted graphs, showing the power of MM for learning the synthetic relations. In the second experiment, we examine real-world relational learning of LLMs by evaluating their relational subgraphs and measuring how well the evaluated subgraphs align with the real world. Our results show that the evaluated relations do align with the real world to some degree and more powerful models exhibit better alignment.
Researcher Affiliation Academia 1National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2Institute for Artificial Intelligence, Peking University 3Pazhou Laboratory (Huangpu), Guangzhou, China 4Department of Computer Science, University of Illinois Chicago.
Pseudocode Yes Algorithm 1 Hypergraph Estimation from Datasets
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the methodology described in this paper is publicly available.
Open Datasets Yes We use subgraphs extracted from Concept Net (Speer et al., 2017) as baselines of the real-world relations graphs.
Dataset Splits Yes For each graph, we generate 100000 samples for each graph, with 80000 samples for training, 10000 samples for validation, and 10000 samples for testing.
Hardware Specification Yes All the models are trained on two NVIDIA GeForce RTX 3090 GPUs.
Software Dependencies No The paper mentions using Hugging Face's implementation and Adam W, but it does not specify version numbers for these or other software dependencies.
Experiment Setup Yes For the masking strategy, we mask one of the tokens in a sample uniformly at random. We train the model by Adam W, with the initial learning rate 2 10 5, weight decay 0.01, the cosine scheduler. The other hyperparameters of Adam W are the same as the default of Hugging Face Trainer Arguments. We pre-train the model for 100 epochs. Per-device training batch size is 256.