reproducibilityindex.ai

Contrastive Unsupervised Word Alignment with Non-Local Features

Authors: Yang Liu, Maosong Sun

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our approach achieves signiﬁcant improvements over state-of-the-art unsupervised word alignment methods.
Researcher Affiliation	Academia	State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China Jiangsu Collaborative Innovation Center for Language Competence, Jiangsu 221009, China {liuyang2011,sms}@tsinghua.edu.cn
Pseudocode	No	The paper describes the beam search algorithm in text but does not present it as a formal pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	For French-English, we used the dataset from the HLT/NAACL 2003 alignment shared task (Mihalcea and Pedersen 2003). For Chinese-English, we used the dataset from Liu et al. (2005).
Dataset Splits	Yes	For French-English, ...The training set consists of 1.1M sentence pairs with 23.61M French words and 20.01M English words, the validation set consists of 37 sentence pairs, and the test set consists of 447 sentence pairs. For Chinese-English, ...The training set consists of 1.5M sentence pairs with 42.1M Chinese words and 48.3M English words, the validation set consists of 435 sentence pairs, and the test set consists of 500 sentence pairs.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions names of other systems like 'GIZA++' and 'fast align', but does not provide specific version numbers for any software dependencies used in their own implementation.
Experiment Setup	No	The paper mentions that SGD was used for optimization, that noise was generated by shuffling, replacing, deleting, and inserting words, and that 'n=1' and 'all 16 features' were used. However, it lacks specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations required for full reproducibility.