Sequential Copying Networks
Authors: Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on abstractive sentence summarization and question generation tasks show that the proposed Seq Copy Net can copy meaningful spans and outperforms the baseline models. |
| Researcher Affiliation | Collaboration | Harbin Institute of Technology, Harbin, China Microsoft Research, Beijing, China |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "We release the preprocessing script and this test set at http://res.qyzhou.me". This link provides a preprocessing script and dataset, but does not explicitly state that the source code for the Seq Copy Net methodology itself is provided. |
| Open Datasets | Yes | We conduct abstractive sentence summarization experiment on English Gigaword dataset, as mentioned in Rush, Chopra, and Weston (2015). ... We use the Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al. 2016) as our training data. |
| Dataset Splits | Yes | We modify the script released by Rush, Chopra, and Weston (2015) to pre-process and extract the training and development datasets. We obtain the test set used by Rush, Chopra, and Weston (2015). ... Following (Zhou et al. 2017a), we acquired their training, development and test sets, which contain 86,635, 8,965 and 8,964 triples respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | Yes | We also use the same Stanford Core NLP v3.7.0 (Manning et al. 2014) to annotate POS and NER tags in sentences with its default configuration and pre-trained models. |
| Experiment Setup | Yes | We set the word embedding size to 300 and all GRU hidden state sizes to 512. We use dropout (Srivastava et al. 2014) with probability p = 0.4. We initialize model parameters randomly using a Gaussian distribution with Xavier scheme (Glorot and Bengio 2010). We use Adam (Kingma and Ba 2015) as our optimizing algorithm. For the hyperparameters of Adam optimizer, we set the learning rate α = 0.001, two momentum parameters β1 = 0.9 and β2 = 0.999 respectively, and ϵ = 10 8. ... We also apply gradient clipping (Pascanu, Mikolov, and Bengio 2013) with range [ 5, 5] during training. ... we use mini-batch size 64 by grid search. |