reproducibilityindex.ai

Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models

Authors: Huan Zhang, Hai Zhao

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that our new training criterion can usually work better than existing methods, on both the tasks of machine translation and sentence summarization.
Researcher Affiliation	Academia	Huan Zhang, Hai Zhao Department of Computer Science and Engineering Shanghai Jiao Tong University zhanghuan0468@gmail.com, zhaohai@cs.sjtu.edu.cn
Pseudocode	Yes	Algorithm 1 The sampling approach to constructing the approximated n-best list
Open Source Code	No	The paper does not provide an explicit statement or a link to open-source code for the methodology it describes.
Open Datasets	Yes	We use the IWSLT 2014 German-English translation dataset, with the same splits as Ranzato et al. (2016) and Wiseman & Rush (2016)... We use the Gigaword corpus with the same preprocessing steps as in Rush et al. (2015).
Dataset Splits	Yes	We use the IWSLT 2014 German-English translation dataset, with the same splits as Ranzato et al. (2016) and Wiseman & Rush (2016), which contains about 153K training sentence pairs, 7K validation sentence pairs and 7K test sentence pairs... During training, we use the ﬁrst 2K sequences of the dev corpus as validation set
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies	No	The paper mentions software components like 'LSTM', 'RNN', 'GRU', and 'Adam optimizer' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	The encoder is a single-layer bidirectional LSTM with 256 hidden units for either direction, and the decoder LSTM also has 256 hidden units. The size of word embedding for both encoder and decoder is 256. We use a dropout rate... of 0.2... The batch size is set to 32 and the training set is shufﬂed at each new epoch. All models are trained with the Adam optimizer... The MLE baseline is trained with a learning rate of 3.0 10 4. The model is trained for 20 epochs... α for MRT... is set to 5.0 10 3 and the sample size is 100. α and τ for Hellinger loss... α = 5.0 10 4, τ = 0.5.