QASA: Advanced Question Answering on Scientific Articles
Authors: Yoonjoo Lee, Kyungjae Lee, Sunghyun Park, Dasol Hwang, Jaehyeon Kim, Hong-In Lee, Moontae Lee
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that QASA s fullstack inference outperforms the state-of-the-art INSTRUCTGPT by a big margin. |
| Researcher Affiliation | Collaboration | 1KAIST (Work done at LG AI Research) 2LG AI Research 3Yonsei University 4University of Illinois Chicago. |
| Pseudocode | No | The paper describes its methodology narratively, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | No | The paper states, 'The dataset is available at https://github.com/lgresearch/QASA.', but it does not explicitly state that the source code for the QASA approach or its underlying methodology is publicly released or available. |
| Open Datasets | Yes | The dataset is available at https://github.com/lgresearch/QASA. ... we adopt S2ORC (Lo et al., 2020), a collection of machine-readable full text for open-access papers, and the arXiv1 paper collection. ... we exploit public and synthetic data for the purpose of each subtask. Table 2 shows a summary of used public data. Task Dataset Associative Selection QASPER, ASQA Rationale Generation QASPER Answer Composition ASQA, ELI5 |
| Dataset Splits | No | The paper mentions selecting the best checkpoint based on the 'validation set' in Appendix C ('We trained all models until 5 epochs and selected the best checkpoint with average R-2 scores of answer composition on validation set.'), but it does not provide specific details on the dataset splits (percentages or counts) for its own QASA benchmark or for how the public datasets were partitioned for training, validation, and testing. |
| Hardware Specification | Yes | All of our experiments were conducted using 16 A100 GPUs. |
| Software Dependencies | No | The paper mentions using various large language models (T5, T0, FLAN-T5, GALACTICA, INSTRUCTGPT) and fine-tuning, but does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the implementation. |
| Experiment Setup | Yes | To simplify all experiments, we fixed the initial learning rate to 1e-5. We trained all models until 5 epochs and selected the best checkpoint with average R-2 scores of answer composition on validation set. |