reproducibilityindex.ai

Question Calibration and Multi-Hop Modeling for Temporal Question Answering

Authors: Chao Xue, Di Liang, Pengfei Wang, Jing Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results verify that the proposed model achieves better performance than the state-of-the-art models in the benchmark dataset. Notably, the Hits@1 and Hits@10 results of QC-MHM on the Cron Questions dataset s complex questions are absolutely improved by 5.1% and 1.2% compared to the best-performing baseline.
Researcher Affiliation	Academia	Chao Xue1, Di Liang2, Pengfei Wang3, Jing Zhang1 1School of Software, Beihang University, Beijing, China 2School of Computer Science, Fudan University, Shanghai, China 3School of Software, Zhejiang University, Hangzhou, China
Pseudocode	No	The paper describes the steps of its proposed modules in narrative text and mathematical formulas but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We employ two temporal KGQA benchmarks, CRONQUESTIONS (Saxena, Chakrabarti, and Talukdar 2021) and Time-Questions (Jia et al. 2021).
Dataset Splits	No	The paper discusses the categorization of the Cron Questions dataset into different question and answer types (e.g., Complex, Simple Entity, Simple Time) and mentions 'train', 'validation', and 'test' in the context of the model's internal learning process. However, it does not explicitly provide the specific percentages or absolute counts for the overall training, validation, and test dataset splits used for their experiments.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions several software components and models like 'Sentence BERT', 'TComplEx', 'BERT', 'RoBERTa', 'KnowBERT', and 'T5', along with their corresponding citations. However, it does not provide specific version numbers for these software components or any other ancillary software dependencies required for reproducibility.
Experiment Setup	No	The paper describes the model architecture and loss functions, including a weight coefficient 'λ' for the loss. However, it does not provide concrete numerical values for essential experimental setup details such as learning rate, batch size, number of epochs, specific optimizer, or dropout rates.