Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
TempoQR: Temporal Question Reasoning over Knowledge Graphs
Authors: Costas Mavromatis, Prasanna Lakkur Subramanyam, Vassilis N. Ioannidis, Adesoji Adeshina, Phillip R Howard, Tetiana Grinberg, Nagib Hakim, George Karypis5825-5833
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Tempo QR improves accuracy by 25 45 percentage points on complex temporal questions over state-of-the-art approaches and it generalizes better to unseen question types. |
| Researcher Affiliation | Collaboration | 1University of Minnesota 2University of Massachusetts Amherst 3Amazon Web Services 4Intel Labs |
| Pseudocode | No | The paper describes its methods using prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | For reproducibility, our code is available at: https://github.com/cmavro/Tempo QR. |
| Open Datasets | Yes | Cron Questions (Saxena, Chakrabarti, and Talukdar 2021) is a temporal QA benchmark based on the Wikidata TKG proposed in (Lacroix, Obozinski, and Usunier 2020). |
| Dataset Splits | Yes | Cron Questions consists of 410k unique question-answer pairs, 350k of which are for training and 30k for validation and for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch, TComplEx, BERT, and RoBERTa, but does not specify their version numbers, which is required for reproducible software dependencies. |
| Experiment Setup | Yes | We learn TKG embeddings with the TCompl Ex method, where we set their dimensions D = 512. During, QA the pre-trained LM s parameters and the TKG embeddings are not updated. We set the number of transformer layers of the encoder f( ) to l = 6 with 8 heads per layer. We also observed the same performance when setting l = 3 with 4 heads per layer. The model s parameters are updated with Adam (Kingma and Ba 2014) with a learning rate of 0.0002. The model is trained for 20 maximum epochs and the ο¬nal parameters are determined based on the best validation performance. |