reproducibilityindex.ai

Hybrid Autoregressive Inference for Scalable Multi-Hop Explanation Regeneration

Authors: Marco Valentino, Mokanarangan Thayaparan, Deborah Ferreira, André Freitas11403-11411

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that the hybrid framework significantly outperforms previous sparse models, achieving performance comparable with that of state-of-the-art cross-encoders while being 50 times faster and scalable to corpora of millions of facts.
Researcher Affiliation	Academia	Marco Valentino1,2, Mokanarangan Thayaparan1,2, Deborah Ferreira1, Andr e Freitas1,2 1 University of Manchester, United Kingdom 2 Idiap Research Institute, Switzerland
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Implementation and pre-trained models adopted for the experiments are available online2. 2https://github.com/ai-systems/hybrid autoregressive inference
Open Datasets	Yes	We perform an extensive evaluation on the World Tree corpus adopting the dataset released for the shared task on multi-hop explanation regeneration1 (Jansen and Ustalov 2019)
Dataset Splits	Yes	We adopt explanations and hypotheses in the training-set ( 1, 000) for training the dense encoder and computing the explanatory power for unseen hypotheses at inference time. We perform an extensive evaluation on the World Tree corpus adopting the dataset released for the shared task on multi-hop explanation regeneration1 (Jansen and Ustalov 2019)... The World Tree corpus provides a held-out test-set consisting of 1,240 science questions... The studies are performed on the dev-set since the explanations on the test-set are masked.
Hardware Specification	Yes	To this end, we run SCAR on 1 16GB Nvidia Tesla P100 GPU and compare the inference time with that of dense models executed on the same infrastructure (Cartuyvels, Spinks, and Moens 2020).
Software Dependencies	No	The paper mentions software like Sentence BERT, BM25, and FAISS, and specific models like bert-base-uncased, but does not provide version numbers for these software dependencies (e.g., PyTorch, TensorFlow, specific library versions).
Experiment Setup	Yes	The best results on explanation regeneration are obtained when running SCAR for 4 inference steps (additional details in Ablation Studies)... We found that the best results are obtained using 5 negative examples for each positive tuple.