Ask the Right Questions: Active Question Reformulation with Reinforcement Learning
Authors: Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang.
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate on Search QA, a dataset of complex questions extracted from Jeopardy!. The agent outperforms a stateof-the-art base model, playing the role of the environment, and other benchmarks. |
| Researcher Affiliation | Industry | Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang Google {cbuck,jbulian,massi,wgaj,agesmundo,neilhoulsby,wangwe}@google.com |
| Pseudocode | No | The paper describes the models and training procedures using text and mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper links to a third-party tool (sentencepiece) that was used (https://github.com/google/sentencepiece), but it does not provide any link or statement about the availability of the source code for the methodology developed in this paper. |
| Open Datasets | Yes | We evaluate on a dataset of Jeopardy! questions, Search QA (Dunn et al., 2017). These questions are hard to answer by design because they use convoluted language, e.g., Travel doesn t seem to be an issue for this sorcerer & onetime surgeon; astral projection & teleportation are no prob (answer: Doctor Strange). Thus Search QA tests the ability of AQA to reformulate questions such that the QA system has the best chance of returning the correct answer. The United Nations Parallel Corpus v1.0 (Ziemski et al., 2016). This dataset contains 11.4M sentences which are fully aligned across six UN languages: Arabic, English, Spanish, French, Russian, and Chinese. From all bilingual pairs, we produce a multilingual training corpus of 30 language pairs. This yields 340M training examples which we use to train the zero-shot neural MT system (Johnson et al., 2016). extracted from the Paralex database of question paraphrases (Fader et al., 2013). We also investigate the general paraphrasing abilities of our model, focusing on the relation between paraphrasing quality and QA quality. To tease apart the relationship between paraphrasing and reformulation for QA we evaluated 3 variants of the reformulator: Base-NMT This is the model used to initialize RL training of the agent. Trained first on the multilingual U.N. corpus, then on the Paralex corpus, as detailed in Section 5.2. Base-NMT-No Paralex This is the model above trained solely on the multilingual U.N. corpus, without the Paralex monolingual corpus. Base-NMT+Quora This is the same as Base-NMT, additionally trained on the Quora dataset7 which contains 150k duplicate questions. Following Prakash et al. (2016), we evaluate all models on the MSCOCO8 (Lin et al., 2014) validation set (VAL2014). |
| Dataset Splits | Yes | We train our model on the pre-defined training split, perform model selection and tuning on the validation split and report results on the validation and test splits. The training, validation and test sets contain 99,820, 13,393 and 27,248 examples, respectively. |
| Hardware Specification | No | The paper mentions running experiments "on GPUs" and "on CPU" but does not provide any specific models, types, or detailed specifications for the hardware used. |
| Software Dependencies | No | The paper mentions using specific optimizers (Adam, SGD) and a tokenization tool (sentencepiece, with a GitHub link provided), but it does not provide version numbers for any key software components or libraries (e.g., Python, TensorFlow, PyTorch). |
| Experiment Setup | Yes | We trained with the Adam optimizer for 4500 steps, using learning rate 0.001, batch size 60. The model converged after training on 400M instances using the Adam optimizer with a learning rate of 0.001 and batch size of 128. After pre-training the reformulator, we switch the optimizer from Adam to SGD and train for 100k RL steps of batch size 64 with a low learning rate of 0.001. We use an entropy regularization weight of λ = 0.001. |