Question Calibration and Multi-Hop Modeling for Temporal Question Answering

Authors: Chao Xue, Di Liang, Pengfei Wang, Jing Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results verify that the proposed model achieves better performance than the state-of-the-art models in the benchmark dataset. Notably, the Hits@1 and Hits@10 results of QC-MHM on the Cron Questions dataset s complex questions are absolutely improved by 5.1% and 1.2% compared to the best-performing baseline.
Researcher Affiliation Academia Chao Xue*1, Di Liang*2, Pengfei Wang3, Jing Zhang1 1School of Software, Beihang University, Beijing, China 2School of Computer Science, Fudan University, Shanghai, China 3School of Software, Zhejiang University, Hangzhou, China
Pseudocode No The paper describes the steps of its proposed modules in narrative text and mathematical formulas but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We employ two temporal KGQA benchmarks, CRONQUESTIONS (Saxena, Chakrabarti, and Talukdar 2021) and Time-Questions (Jia et al. 2021).
Dataset Splits No The paper discusses the categorization of the Cron Questions dataset into different question and answer types (e.g., Complex, Simple Entity, Simple Time) and mentions 'train', 'validation', and 'test' in the context of the model's internal learning process. However, it does not explicitly provide the specific percentages or absolute counts for the overall training, validation, and test dataset splits used for their experiments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions several software components and models like 'Sentence BERT', 'TComplEx', 'BERT', 'RoBERTa', 'KnowBERT', and 'T5', along with their corresponding citations. However, it does not provide specific version numbers for these software components or any other ancillary software dependencies required for reproducibility.
Experiment Setup No The paper describes the model architecture and loss functions, including a weight coefficient 'λ' for the loss. However, it does not provide concrete numerical values for essential experimental setup details such as learning rate, batch size, number of epochs, specific optimizer, or dropout rates.