Translucent Answer Predictions in Multi-Hop Reading Comprehension
Authors: G P Shrivatsa Bhargav, Michael Glass, Dinesh Garg, Shirish Shevade, Saswati Dana, Dinesh Khandelwal, L Venkata Subramaniam, Alfio Gliozzo7700-7707
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | TAP offers state-of-the-art performance on the Hotpot QA (Yang et al. 2018) dataset an apt dataset for multi-hop RCQA task as it occupies Rank-1 on its leaderboard (https://hotpotqa.github.io/) at the time of submission. and 6 Experiments Hotpot QA is a large scale QA dataset focusing on explainability and multi-hop reasoning. ... We evaluate TAP on the hidden test set for the distractor setting of Hotpot QA by submitting our system for evaluation. We also use the publicly available development set to explore the impact of decisions in our architecture. and Table 2: Performance of TAP (ours) in comparison with the next closest and closest published models on the Hotpot QA leader board. |
| Researcher Affiliation | Collaboration | 1IBM Research AI, 2Dept. of CSA, IISc, Bangalore {mrglass, gliozzo}@us.ibm.com, {bhargavs, shirish}@iisc.ac.in, {garg.dinesh, sadana04, dikhand1, lvsubram}@in.ibm.com |
| Pseudocode | No | The paper provides architectural diagrams and descriptive text for its components, but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The TAP code repository can be found at https://github.com/IBM/translucent-answer-prediction. |
| Open Datasets | Yes | Hotpot QA (Yang et al. 2018) dataset an apt dataset for multi-hop RCQA task as it occupies Rank-1 on its leaderboard (https://hotpotqa.github.io/) at the time of submission. and Hotpot QA is a large scale QA dataset focusing on explainability and multi-hop reasoning. This dataset comes with human annotated sentence level binary labels indicating which sentences are supporting facts for answering a given question. |
| Dataset Splits | Yes | Hotpot QA (Yang et al. 2018) dataset and Table 1 shows some statistics on the training and development sets. and We also use the publicly available development set to explore the impact of decisions in our architecture. |
| Hardware Specification | Yes | Training Lo GIX took approximately 24 hours on 8 P100 GPUs. In the joint setting this training was done for each of the five folds. The Answer Predictor takes under 10 hours to train on 4 P100 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch was used to develop TAP' but does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use pre-trained BERTLARGE models. In the Global Layer of Lo GIX, there are two transformer layers. For both networks we use the ADAM (Kingma and Ba 2015) optimizer with a maximum learning rate of 3 10 5 and a triangular learning schedule, warming up over the first 10% of training instances. Questions are truncated to 35 tokens and passages are truncated to 512 tokens. The total length of the passage set is limited to 2048 tokens, with the longest passages truncated to fit. We trained Lo GIX for 4 epochs with a batch size of 8 and the Answer Predictor also for 4 epochs with a batch size of 16. |