Community Question Answering Entity Linking via Leveraging Auxiliary Data

Authors: Yuhan Li, Wei Shen, Jianbo Gao, Yadong Wang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the superiority of our framework through extensive experiments over a newly released CQAEL data set against state-of-the-art entity linking methods.
Researcher Affiliation Academia Yuhan Li , Wei Shen , Jianbo Gao and Yadong Wang TMCC, TKLNDST, College of Computer Science, Nankai University, Tianjin, China
Pseudocode No The paper describes methods in text and provides mathematical equations, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes We release the data set and codes to facilitate the research towards this new task4. 4https://github.com/yh Leeee/CQA Entity Linking
Open Datasets Yes We create a new data set named Quora EL to support the study of the CQAEL task. We release the data set and codes to facilitate the research towards this new task4. 4https://github.com/yh Leeee/CQA Entity Linking
Dataset Splits Yes We use 5-fold cross-validation and split the CQA texts into training (70%), validation (10%), and testing (20%).
Hardware Specification Yes All experiments are implemented by Mind Spore Framework6 with two NVIDIA Geforce GTX 3090 (24GB) GPUs.
Software Dependencies No The paper mentions 'Mind Spore Framework', 'xlnet-base-cased', and 'longformer-base-4096 models', and 'Adam W optimizer' but does not provide specific version numbers for the Mind Spore framework or the general software dependencies (like Python, PyTorch/TensorFlow, etc.) beyond the specific model variants.
Experiment Setup Yes For training, we adopt Adam W [Loshchilov and Hutter, 2018] optimizer with a warmup rate 0.1, an initial learning rate 1e-5, and a mini-batch size 2. Dropout with a probability of 0.1 is used to alleviate over-fitting. For the base module, the maximum sequence length is set to 128. For the auxiliary data module, maximum lengths of the candidate entity description and each text are set to 128 and 64, respectively. The hyperparameter k is set to 3, whose impact to the performance will be studied later.