ARIA: Asymmetry Resistant Instance Alignment

Authors: Sanghoon Lee, Seung-won Hwang

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental Evaluation Settings Evaluations were conducted on an Intel quad-core i7 3.6GHz CPU with 32 GB RAM equipped with Java 7. Alignment accuracy was measured by precision and recall. To evaluate blocking quality, we used reduction ratio (RR) and pair completeness (PC) RR is the ratio of pruned instance pairs among all possible pairs, and PC is the ratio of true matches for all pairs. We encoded the identifiers (e.g., URIs) of instances, relations, and concepts to avoid cheating by using URI text as alignment clues. For datasets, we used DBpedia (Lehmann et al. 2014) and YAGO (Biega, Kuzey, and Suchanek 2013), which are realworld large-scale KBs that cover millions of instances.
Researcher Affiliation Academia Sanghoon Lee and Seung-won Hwang Pohang University of Science and Technology (POSTECH), Korea, Republic of {sanghoon, swhwang}@postech.edu
Pseudocode Yes Algorithm 1 Block(IX, IY , TX, TY , k, t)
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes For datasets, we used DBpedia (Lehmann et al. 2014) and YAGO (Biega, Kuzey, and Suchanek 2013), which are realworld large-scale KBs that cover millions of instances.
Dataset Splits No The paper mentions using 'seed matches' as training data for learning concept correlations and refers to 'gold standards' and 'ground truth' for evaluation. However, it does not explicitly provide details about training/validation/test dataset splits (e.g., percentages, absolute counts, or predefined splits) for model training or hyperparameter tuning.
Hardware Specification Yes Evaluations were conducted on an Intel quad-core i7 3.6GHz CPU with 32 GB RAM equipped with Java 7.
Software Dependencies Yes equipped with Java 7
Experiment Setup Yes Candidate degree threshold t was set to 10 in this experiment. Our blocking method showed near perfect reduction ratio (RR) in all domains (Table 4), which shows that the method has high effectiveness in reducing the search space for matching. Pair completeness (PC) is the upper bound to recall of alignment. PC was sufficiently high for the person and location domains, and ARIA achieved recall close to the bound obtained from PC. Note this bound is notably low for organizations due to feature sparsity, which explains low recalls of both ARIA and PARIS for this specific domain. Lastly, we evaluate the robustness of instance similarities between the result candidates of blocking methods for each domain (Table 5). We set triple similarity threshold θ as 0.8.