Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation
Authors: Jinsong Su, Zhixing Tan, Deyi Xiong, Rongrong Ji, Xiaodong Shi, Yang Liu
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results on Chinese-English translation demonstrate the superiorities of the proposed encoders over the conventional encoder. |
| Researcher Affiliation | Academia | Xiamen University, Xiamen, China1 Soochow University, Suzhou, China2 Tsinghua University, Beijing, China3 |
| Pseudocode | No | The paper presents mathematical equations for the GRU and its variants, along with architectural diagrams, but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions 'the toolkit2 released by Stanford to train word segmenters' and provides a URL (http://nlp.stanford.edu/software/segmenter.html#Download). However, this refers to a third-party tool used by the authors, not the open-sourcing of their own proposed methodology's code. |
| Open Datasets | Yes | Our training data consists of 1.25M sentence pairs extracted from LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06, with 27.9M Chinese words and 34.5M English words. |
| Dataset Splits | Yes | We chosed the NIST 2005 dataset as the validation set and the NIST 2002, 2003, 2004, 2006, and 2008 datasets as test sets. |
| Hardware Specification | Yes | We used a single GPU device Titan X to train models. |
| Software Dependencies | No | The paper mentions 'Rmsprop (Graves 2013)' and 'multi-bleu.perl script', and 'the toolkit2 released by Stanford' but does not specify version numbers for any software components, libraries, or dependencies used for the experiments. |
| Experiment Setup | Yes | During this procedure, we set the following hyper-parameters: word embedding dimension as 320, hidden layer size as 512, learning rate as 5 10 4, batch size as 80, gradient norm as 1.0. |