MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models

Authors: Boyuan Pan, Yazheng Yang, Hao Li, Zhou Zhao, Yueting Zhuang, Deng Cai, Xiaofei He

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on neural machine translation (NMT) and abstractive text summarization show that our proposed framework can significantly improve the performance of the baseline models, and our method for the abstractive text summarization achieves the state-of-the-art results on the Gigaword dataset.
Researcher Affiliation Collaboration State Key Lab of CAD&CG, Zhejiang University College of Computer Science, Zhejiang University Alibaba-Zhejiang University Joint Institute of Frontier Technologies Fabu Inc., Hangzhou, China {panby, yazheng_yang, haolics, zhaozhou, yzhuang, dcai}@zju.edu.cn xiaofeihe@fabu.ai
Pseudocode No The paper describes the framework using mathematical equations and diagrams, but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper mentions using external scripts for data preprocessing ('scripts5 supplied by See et al. [2017]' and 'script6 released by Rush et al. [2015]') but does not provide any statement or link for the open-sourcing of their own Mac Net methodology code.
Open Datasets Yes We use the Stanford Question Answering Dataset (SQu AD)[Rajpurkar et al., 2016] as our training set2, which has 100,000+ questions posed by crowd workers on 536 Wikipedia articles. The SQu AD dataset is referred at: https://rajpurkar.github.io/SQu AD-explorer/
Dataset Splits Yes For the CNN/Daily Mail dataset... which contains 287k training pairs, 13k validation pairs and 11k test pairs. For the English Gigaword dataset... and obtain 3.8M training pairs, 189k development set for testing.
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models, memory, or specific cloud resources used for running the experiments. It only mentions: 'The experiments are supported by Chengwei Yao in the Experiment Center of the College of Computer Science and Technology, Zhejiang University.'
Software Dependencies No The paper mentions software components and optimizers like GloVe, Ada Delta, BPE, Adagrad, but does not provide specific version numbers for any of these or any underlying programming languages or deep learning frameworks.
Experiment Setup Yes For the NMT systems... We train 4-layer LSTMs of 1024 units with bidirectional encoder, embedding dimension is 1024. The model is trained with stochastic gradient descent with a learning rate that began at 1. We train for 340K steps; after 170K steps, we start halving learning rate every 17K step. Our batch size is set as 128, the dropout rate is 0.2. For the focal loss, the γ is set to be 5.