What Do You Mean ‘Why?’: Resolving Sluices in Conversations
Authors: Victor Petrén Bach Hansen, Anders Søgaard7887-7894
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a crowd-sourced dataset containing annotations of sluices from over 4,000 dialogues collected from conversational QA datasets, as well as a series of strong baseline architectures. We conduct a series of baseline experiments on this task, using both encoder-decoder frameworks, as well as language modelling objectives, and show through human evaluation of the predicted resolutions that these baselines are quite strong and at times even rival the quality of human annotators. |
| Researcher Affiliation | Collaboration | Victor Petr en Bach Hansen,1,2 Anders Søgaard1,3 1Department of Computer Science, University of Copenhagen, Denmark 2Topdanmark A/S, Denmark 3Google Research, Berlin {victor.petren, soegaard}@di.ku.dk |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release the raw annotated version of the conversational sluicing corpus, as well as our cleaned version which we report our results on, including the splits used.3 3https://github.com/vpetren/conv_sluice_resolution |
| Open Datasets | Yes | we crawl existing conversational QA datasets, namely Qu AC1 and Co QA,2 for question-answer contexts with one-word follow-up questions. 1https://quac.ai/ 2https://stanfordnlp.github.io/coqa/ |
| Dataset Splits | Yes | In our experiments, we use the splits outlined in Table 1 (also made publicly available). Split Why Where Who What When Total train 851 714 513 302 702 3082 val 84 71 54 39 52 300 test 229 183 97 83 201 793 Total 1164 968 664 424 955 4175 |
| Hardware Specification | Yes | Unlike the LSTMseq2seq and Transformer, we do not fine-tune the GPT-2 model until convergence, but instead we ran it for 18 hours on an Nvidia Titan X GPU. |
| Software Dependencies | No | The paper mentions software like GloVE, Adam optimizer, LSTM, Transformer, and GPT-2, and points to PyTorch implementations, but does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | For both the encoder and decoder, we use a standard two-layer LSTM (Hochreiter and Schmidhuber 1997), with a hidden state size of 512, and regularized using a dropout rate of 0.5. We initialize the embedding matrix with 300 dimensional GloVE (Pennington, Socher, and Manning 2014), which remains fixed during training. We optimize the end-to-end network using Adam (Kingma and Ba 2014), with the default learning rate of 0.001. As our conversational sluicing resolution corpus is small in comparison to the corpora used in the experiments by Vaswani et al. (2017), we limit ourselves to three encoder/decoder layers to 3 (compared to 6 in their work), after observing improvements on our validation data. |