A Controllable Model of Grounded Response Generation
Authors: Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Bill Dolan14085-14093
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Quantitative and qualitative results show that, using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines. We show through qualitative and quantitative evaluations that CGRG outperforms strong baselines where: a) the control phrases are provided by a (simulated) user, and b) automatically extracted by a control phrase prediction model. |
| Researcher Affiliation | Collaboration | 1University of Washington, Seattle, WA, USA 2Microsoft Research, Redmond, WA, USA 3Allen Institute for AI, Seattle, WA, USA |
| Pseudocode | No | The paper describes mathematical formulas and procedural steps in prose but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | To further facilitate reproducibility, we release our data preparation and modeling code at https://github.com/ellenmellon/CGRG. |
| Open Datasets | Yes | We start with the grounded Reddit conversation dataset described in Qin et al. (2019). This dataset is a filtered version of (Qin et al. 2019) s public dataset. |
| Dataset Splits | Yes | The number of utterances of train, dev and test are 390K, 6.7K and 21K, respectively. |
| Hardware Specification | Yes | Each training process is run on 2 Tesla K-80 nodes. |
| Software Dependencies | No | The paper mentions using "GPT-2" and "Dialo GPT" models/architectures, but does not provide specific version numbers for any underlying software dependencies (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | We set the maximum number of sentences in GC to be 20 and maximum number of phrases in C to be 10, then we have 0 for X; 1-20 for GC; 21-30 for C and 31 for R tokens as type embedding. We use the small version of GPT-2 with 117M parameters, with the maximum length of the input or target response sequence to be 512. We use batch size 32. Learning rate (1e-5) and warmup steps (1600) are tuned on the dev set perplexity, with all other parameters being the same as Dialo GPT 4. |