reproducibilityindex.ai

BattRAE: Bidimensional Attention-Based Recursive Autoencoders for Learning Bilingual Phrase Embeddings

Authors: Biao Zhang, Deyi Xiong, Jinsong Su

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of Batt RAE, we incorporate this semantic similarity as an additional feature into a state-of-the-art SMT system. Extensive experiments on NIST Chinese-English test sets show that our model achieves a substantial improvement of up to 1.63 BLEU points on average over the baseline.
Researcher Affiliation	Academia	Xiamen University, Xiamen, China 3610051 Soochow University, Suzhou, China 2150062
Pseudocode	No	The paper describes algorithms and procedures in narrative text and mathematical equations but does not present any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Source code is available at https://github.com/Deep Learn XMU/Batt RAE.
Open Datasets	Yes	Our parallel corpus consists of 1.25M sentence pairs extracted from LDC corpora6, with 27.9M Chinese words and 34.5M English words respectively. We trained a 5-gram language model on the Xinhua portion of the GIGAWORD corpus (247.6M English words) using SRILM Toolkit7 with modiﬁed Kneser-Ney Smoothing.
Dataset Splits	Yes	We used the NIST MT05 data set as the development set, and the NIST MT06/MT08 datasets as the test sets. [...] From these pairs, we further extracted 34K bilingual phrases as our development data to optimize all hyper-parameters using random search (Bergstra and Bengio 2012).
Hardware Specification	No	The paper does not specify any hardware details like CPU models, GPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions using 'SRILM Toolkit', 'Word2Vec', and 'lib LBFGS' but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Finally, we set ds=dt=da=dsem=50, α=0.125 (such that, β=0.875), λL=1e 5, λrec=λatt=1e 4 and λsem=1e 3 according to experiments on the development data. Additionally, we set the maximum number of iterations in the L-BFGS algorithm to 100.