reproducibilityindex.ai

A Controllable Model of Grounded Response Generation

Authors: Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Bill Dolan14085-14093

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Quantitative and qualitative results show that, using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines. We show through qualitative and quantitative evaluations that CGRG outperforms strong baselines where: a) the control phrases are provided by a (simulated) user, and b) automatically extracted by a control phrase prediction model.
Researcher Affiliation	Collaboration	1University of Washington, Seattle, WA, USA 2Microsoft Research, Redmond, WA, USA 3Allen Institute for AI, Seattle, WA, USA
Pseudocode	No	The paper describes mathematical formulas and procedural steps in prose but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	To further facilitate reproducibility, we release our data preparation and modeling code at https://github.com/ellenmellon/CGRG.
Open Datasets	Yes	We start with the grounded Reddit conversation dataset described in Qin et al. (2019). This dataset is a ﬁltered version of (Qin et al. 2019) s public dataset.
Dataset Splits	Yes	The number of utterances of train, dev and test are 390K, 6.7K and 21K, respectively.
Hardware Specification	Yes	Each training process is run on 2 Tesla K-80 nodes.
Software Dependencies	No	The paper mentions using "GPT-2" and "Dialo GPT" models/architectures, but does not provide specific version numbers for any underlying software dependencies (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	We set the maximum number of sentences in GC to be 20 and maximum number of phrases in C to be 10, then we have 0 for X; 1-20 for GC; 21-30 for C and 31 for R tokens as type embedding. We use the small version of GPT-2 with 117M parameters, with the maximum length of the input or target response sequence to be 512. We use batch size 32. Learning rate (1e-5) and warmup steps (1600) are tuned on the dev set perplexity, with all other parameters being the same as Dialo GPT 4.