Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
Authors: Yang Chen, Cong Fang, Zhouchen Lin, Bing Liu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to back up the validity of our hypergraph formulation for relational learning in PTMs. In the first experiment of synthetic relational learning, we create synthetic entities whose relations compose weighted graphs, showing the power of MM for learning the synthetic relations. In the second experiment, we examine real-world relational learning of LLMs by evaluating their relational subgraphs and measuring how well the evaluated subgraphs align with the real world. Our results show that the evaluated relations do align with the real world to some degree and more powerful models exhibit better alignment. |
| Researcher Affiliation | Academia | 1National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2Institute for Artificial Intelligence, Peking University 3Pazhou Laboratory (Huangpu), Guangzhou, China 4Department of Computer Science, University of Illinois Chicago. |
| Pseudocode | Yes | Algorithm 1 Hypergraph Estimation from Datasets |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the methodology described in this paper is publicly available. |
| Open Datasets | Yes | We use subgraphs extracted from Concept Net (Speer et al., 2017) as baselines of the real-world relations graphs. |
| Dataset Splits | Yes | For each graph, we generate 100000 samples for each graph, with 80000 samples for training, 10000 samples for validation, and 10000 samples for testing. |
| Hardware Specification | Yes | All the models are trained on two NVIDIA GeForce RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using Hugging Face's implementation and Adam W, but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For the masking strategy, we mask one of the tokens in a sample uniformly at random. We train the model by Adam W, with the initial learning rate 2 10 5, weight decay 0.01, the cosine scheduler. The other hyperparameters of Adam W are the same as the default of Hugging Face Trainer Arguments. We pre-train the model for 100 epochs. Per-device training batch size is 256. |