MGAD: Learning Descriptional Representation Distilled from Distributional Semantics for Unseen Entities

Authors: Yuanzheng Wang, Xueqi Cheng, Yixing Fan, Xiaofei Zhu, Huasheng Liang, Qiang Yan, Jiafeng Guo

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on four benchmark datasets show that our approach improves the performance over all baseline methods.
Researcher Affiliation Collaboration 1CAS Key Lab of Network Data Science and Technology, ICT, CAS, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3College of Computer Science and Engineering, Chongqing University of Technology 4We Chat, Tencent, Guangzhou, China
Pseudocode No The paper describes the model architecture and loss functions, but it does not provide any pseudocode or algorithm blocks.
Open Source Code Yes Our data, code and models are available at https://github.com/ dalek-who/MGAD-entity-linking
Open Datasets Yes To evaluate the performance of our model, we choose four widely used entity linking datasets: AIDA [Hoffart et al., 2011], ACE [Ratinov et al., 2011] , AQUAINT [Milne and Witten, 2008] and MSNBC [Cucerzan, 2007].
Dataset Splits Yes for further improving the matching ability, we infer the entity embeddings of Eseen, and finetune the mention encoder of MGAD on AIDA-train with entity embeddings fixed.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments.
Software Dependencies No The paper mentions using RoBERTa and BERT-based models, and the Adam optimizer, but it does not provide specific version numbers for any software libraries, frameworks, or languages (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes The best weights of four losses are α1 = 0.338, α2 = 0.002, α3 = 0.33, α4 = 0.33, where the sum of weights equals to 1. α2 is much smaller since the value of Lea is 102 times greater than others. Temperature τ = 1 in Eq. (8) and (12), and τ = 2 in Eq. (10). The optimizer is Adam [Kingma and Ba, 2015] with learning rate 2 × 10−5 and weight decay 0.01, linearwarmup learning rate scheduler with first 10% warmup steps.