A Controllable Model of Grounded Response Generation

Authors: Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Bill Dolan14085-14093

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Quantitative and qualitative results show that, using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines. We show through qualitative and quantitative evaluations that CGRG outperforms strong baselines where: a) the control phrases are provided by a (simulated) user, and b) automatically extracted by a control phrase prediction model.
Researcher Affiliation Collaboration 1University of Washington, Seattle, WA, USA 2Microsoft Research, Redmond, WA, USA 3Allen Institute for AI, Seattle, WA, USA
Pseudocode No The paper describes mathematical formulas and procedural steps in prose but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes To further facilitate reproducibility, we release our data preparation and modeling code at https://github.com/ellenmellon/CGRG.
Open Datasets Yes We start with the grounded Reddit conversation dataset described in Qin et al. (2019). This dataset is a filtered version of (Qin et al. 2019) s public dataset.
Dataset Splits Yes The number of utterances of train, dev and test are 390K, 6.7K and 21K, respectively.
Hardware Specification Yes Each training process is run on 2 Tesla K-80 nodes.
Software Dependencies No The paper mentions using "GPT-2" and "Dialo GPT" models/architectures, but does not provide specific version numbers for any underlying software dependencies (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup Yes We set the maximum number of sentences in GC to be 20 and maximum number of phrases in C to be 10, then we have 0 for X; 1-20 for GC; 21-30 for C and 31 for R tokens as type embedding. We use the small version of GPT-2 with 117M parameters, with the maximum length of the input or target response sequence to be 512. We use batch size 32. Learning rate (1e-5) and warmup steps (1600) are tuned on the dev set perplexity, with all other parameters being the same as Dialo GPT 4.