Sketch and Customize: A Counterfactual Story Generator
Authors: Changying Hao, Liang Pang, Yanyan Lan, Yan Wang, Jiafeng Guo, Xueqi Cheng12955-12962
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed model generates much better endings, as compared with the traditional sequence-to-sequence model. |
| Researcher Affiliation | Collaboration | 1CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Tencent AI Lab |
| Pseudocode | No | The paper describes the model's steps in prose and diagrams but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | The source code and all of the experiments can be found in https://github.com/ying-A/Sand C. |
| Open Datasets | Yes | We use the large version of Time Travel dataset proposed by Qin et al. (2019) as our dataset, it is built on top of the ROCStories (Mostafazadeh et al. 2016) corpus. |
| Dataset Splits | Yes | The Time Travel dataset contains 28,363 training original and counterfactual five-sentences story pairs. The development and test sets both have 1,871 original stories, each of the original stories in the development and test sets has a counterfactual condition and three rewritten counterfactual endings. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning the types of models used (BERT, GPT2). |
| Software Dependencies | No | The paper mentions using BERT, GPT2, and Basic Tokenizer, but does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | The max sequence length is set to 300 with the padding token included. For the sketch model, we use the base uncased version of BERT. The hidden size for the fully connected layer is 768. For the customize model, we use the medium version of GPT2. We use Adam optimization for both models with initial learning rates set as 5e-5 and 1.5e-4 separately. The warmup strategy is applied with warmup-steps set to 2000. The batch sizes for the two stages are set to 8. We train the two-stage models for 5 and 10 epochs respectively and select the best models on the validation set. During the inference in the customize stage, we use top-k sampling with the temperature set to 0.7 and k set to 40. |