reproducibilityindex.ai

NAREOR: The Narrative Reordering Problem

Authors: Varun Gangal, Steven Y. Feng, Malihe Alikhani, Teruko Mitamura, Eduard Hovy10645-10653

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a dataset, NAREORC, with human rewritings of stories within ROCStories in non-linear orders, and conduct a detailed analysis of it. Further, we propose novel task-specific training methods with suitable evaluation metrics. We perform experiments on NAREORC using state-of-the-art models such as BART and T5 and conduct extensive automatic and human evaluations.
Researcher Affiliation	Academia	1 Language Technologies Institute, Carnegie Mellon University 2 School of Computing and Information, University of Pittsburgh {vgangal,syfeng,teruko,hovy}@cs.cmu.edu , malihe@pitt.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code+data at github.com/vgtomahawk/NAREORCam Ready.
Open Datasets	Yes	We set aside 600, 200, 200 stories from train, dev, and test splits of ROCStories. These act as NAREORC s train Sup, dev Sup, and test Sup splits, for which we collect human references. Remaining stories in each ROCStories split are retained as train Unsup, dev Unsup, and test Unsup of size 95161, 1671, 1671.
Dataset Splits	Yes	We set aside 600, 200, 200 stories from train, dev, and test splits of ROCStories. These act as NAREORC s train Sup, dev Sup, and test Sup splits, for which we collect human references. Remaining stories in each ROCStories split are retained as train Unsup, dev Unsup, and test Unsup of size 95161, 1671, 1671.
Hardware Specification	Yes	We used 16GB V100 GPUs for fine-tuning.
Software Dependencies	No	We use Hugging Face s implementations of their base versions. We fine-tuned the base models from Huggingface.
Experiment Setup	Yes	We used a learning rate of 1e-5. Models were trained for 10 epochs with a batch size of 8. The initial warm-up steps are 1000. We also used gradient accumulation steps 4. For inference, we generate output up to length 200.