Tree Edit Distance Learning via Adaptive Symbol Embeddings

Authors: Benjamin Paaßen, Claudio Gallicchio, Alessio Micheli, Barbara Hammer

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we show that BEDL improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.
Researcher Affiliation Academia 1Cognitive Interaction Technology, Bielefeld University, Germany 2Department of Computer Science, University of Pisa, Italy.
Pseudocode No The paper mentions 'Computing this average over all cheapest edit scripts is possible efficiently via a novel forward-backward algorithm which we developed for this contribution (refer to the supplementary material; Paaßen (2018a)).' While an algorithm is mentioned, its pseudocode is referred to supplementary material and not present in the main paper.
Open Source Code Yes As implementations, we used custom implementations of KNN, MGLVQ, the goodness classifier, GESL, and BEDL, which are availabe at https://doi.org/10.4119/unibi/2919994.
Open Datasets Yes Cystic and Leukemia: Two data sets from KEGG/Glycan data base (Hashimoto et al., 2006) adapted from Gallicchio & Micheli (2013)... Sentiment: initialized the vectorial embedding with the 300-dimensional Common Crawl Glo Ve embedding (Pennington et al., 2014).
Dataset Splits Yes On each data set, we perform a crossvalidation1... We used 20 folds for Strings and Sentiment, 10 for Cystic and Leukemia, 8 for Sorting and 6 for Mini Palindrome. For the programming data sets, the number of folds had to be reduced to ensure that each fold still contained a meaningful number of data points. For the Cystic and Leukemia data set, our ten folds were consistent with the paper of Gallicchio & Micheli (2013). In all cases, folds were generated such that the label distribution of the overall data set was maintained.
Hardware Specification Yes All experiments were performed on a consumer-grade laptop with an Intel Core i7-7700 HQ CPU.
Software Dependencies No For SVM, we utilized the LIBSVM standard implementation (Chang & Lin, 2011).
Experiment Setup Yes We optimized all hyper-parameters in a nested 5-fold crossvalidation, namely the number of prototypes K for MGLVQ and LVQ metric learning in the range [1, 15], the number of neighbors for KNN in the range [1, 15], the kernel bandwidth for SVM in the range [0.1, 10], the sparsity parameter λ for the goodness classifier in the range [10 5, 10], and the regularization strength β for GESL and BEDL in the range 2 K m [10 6, 10 2].