Translation-Based Matching Adversarial Network for Cross-Lingual Natural Language Inference

Authors: Kunxun Qi, Jianfeng Du8632-8639

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the XNLI benchmark demonstrate that three popular neural models enhanced by the proposed framework significantly outperform the original models.
Researcher Affiliation Collaboration Guangzhou Key Laboratory of Multilingual Intelligent Processing, Guangdong University of Foreign Studies, Guangzhou 510420, China Platform and Content Group, Tencent, Shenzhen 518000, China
Pseudocode Yes Algorithm 1 The training phase of TMAN
Open Source Code Yes The code of our implementations is available at https://github. com/qikunxun/TMAN/.
Open Datasets Yes We conducted experiments on the XNLI (Conneau et al. 2018) benchmark1, which extends the well-known Multi NLI (Williams, Nangia, and Bowman 2018) benchmark to 15 languages with human-annotated development set and test set. 1http://www.nyu.edu/projects/bowman/xnli/
Dataset Splits Yes For each language, the training set comprises 393K annotated sentence pairs, whereas both the development set and the test set comprise 7500 annotated sentence pairs each. ... the early stopping strategy applied according to the performance on the development set.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments.
Software Dependencies No The paper mentions optimizers like Adam and Ada Delta, and general techniques like dropout, but does not specify version numbers for any software dependencies or libraries (e.g., Python, TensorFlow, PyTorch versions) used in the experiments.
Experiment Setup Yes For TMAN-XLM, the lexical encoder was initialized by the pre-trained XLM model with 12 transformer layers, which outputs 1024D token embeddings. The transformer encoder was built with 8 heads. We applied dropout (Srivastava et al. 2014) to each layer by setting the dropout rate as 0.1. TMAN-XLM was trained by Adam (Kingma and Ba 2015) with initial learning rate 5e-6, mini-batch size 16, and two training epochs. For TMAN-BERT, the lexical encoder was initialized by the pre-trained multilingual BERT model with 12 transformer layers, which outputs 768D token embeddings. The transformer encoder was built with 12 heads. TMAN-BERT was trained by Adam with the warmup mechanism (Devlin et al. 2019) and two training epochs, where the initial learning rate was set as 5e-5, the warmup proportion as 10%, the mini-batch size as 32, and the dropout rate as 0.1. For TMAN-ESIM, the input word vectors were initialized by 300D Fast Text (Bojanowski et al. 2017) word embeddings that were aligned by the MUSE (Lample et al. 2018) algorithm. The bi-directional output dimension of all Bi LSTM (Hochreiter and Schmidhuber 1997) networks was set as 200. The dimension of inference representations was also set as 200. We applied dropout to each layer by setting the dropout rate as 0.3. TMAN-ESIM was trained by Ada Delta (Zeiler 2012) with initial learning rate 3.0, minibatch size 512, maximally 30 training epochs and the early stopping strategy applied according to the performance on the development set. For above models, we set the hyper-parameter λ as 0.1 according to the performance on the development set. Every sentence is truncated to 128 tokens.