Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping

Authors: Mamoun Abu Helou, Matteo Palmonari, Mustafa Jarrar

JAIR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we present a large-scale study on the effectiveness of automatic translations to support two key cross-lingual ontology mapping tasks: the retrieval of candidate matches and the selection of the correct matches for inclusion in the final alignment. We conduct our experiments using four different large gold standards, each one consisting of a pair of mapped wordnets, to cover four different families of languages.
Researcher Affiliation Academia Mamoun Abu Helou EMAIL Department of Informatics, Systems and Communication University of Milan-Bicocca Matteo Palmonari EMAIL Department of Informatics, Systems and Communication University of Milan-Bicocca Mustafa Jarrar EMAIL Department of Computer Science Birzeit University
Pseudocode No The paper describes mathematical definitions for evaluation measures and conceptual steps for translation tasks and mapping selection, but it does not present any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement that the authors' own implementation code for the described methodology is being released, nor does it provide a direct link to a code repository. It mentions using existing tools like Google Translate and Babel Net, but not providing code for their specific study.
Open Datasets Yes As gold standards, we use cross-lingual mappings manually established (or validated) by lexicographers between four wordnets (Arabic, Italian, Slovene and Spanish) and the English Word Net. Footnote 11: The Arabic, Italian, and Slovene wordnets are obtained from OMWN (2015), and the Spanish wordnet is obtained from MCR (2012).
Dataset Splits No The paper uses pre-existing wordnets as gold standards for evaluation. It classifies concepts within these wordnets (e.g., monosemous, polysemous, synonymless, synonymful) and evaluates translation effectiveness against these categories. However, it does not describe specific training, validation, or test splits of these datasets for machine learning or experimental reproduction in the typical sense.
Hardware Specification No The paper describes the experimental setup and methodology, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies Yes In our study, we use two multilingual lexical resources as sources of translations: Google Translate (2015) and Babel Net1 (Navigli & Ponzetto, 2012). Footnote 1: We used Babel Net version 2.5.
Experiment Setup Yes In Section 2, we introduce some preliminary definitions used in the rest of the paper. In Section 3, we overview related work... The evaluation measures and the multilingual lexical resources used in our study to obtain translations are presented respectively in sections 4 and 5. In section 6, we present the experiments. Section 4 defines specific measures like Translation Correctness (Eq. 5), Word Sense Coverage (Eq. 6), Synset Coverage (Eq. 7), and Synonym Coverage (Eq. 8), which are central to the experimental setup. Section 6.1 describes the experimental setup including importing wordnets into a database and compiling bilingual dictionaries with Google Translate API and Babel Net.