reproducibilityindex.ai

Tree Edit Distance Learning via Adaptive Symbol Embeddings

Authors: Benjamin Paaßen, Claudio Gallicchio, Alessio Micheli, Barbara Hammer

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we show that BEDL improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.
Researcher Affiliation	Academia	1Cognitive Interaction Technology, Bielefeld University, Germany 2Department of Computer Science, University of Pisa, Italy.
Pseudocode	No	The paper mentions 'Computing this average over all cheapest edit scripts is possible efﬁciently via a novel forward-backward algorithm which we developed for this contribution (refer to the supplementary material; Paaßen (2018a)).' While an algorithm is mentioned, its pseudocode is referred to supplementary material and not present in the main paper.
Open Source Code	Yes	As implementations, we used custom implementations of KNN, MGLVQ, the goodness classiﬁer, GESL, and BEDL, which are availabe at https://doi.org/10.4119/unibi/2919994.
Open Datasets	Yes	Cystic and Leukemia: Two data sets from KEGG/Glycan data base (Hashimoto et al., 2006) adapted from Gallicchio & Micheli (2013)... Sentiment: initialized the vectorial embedding with the 300-dimensional Common Crawl Glo Ve embedding (Pennington et al., 2014).
Dataset Splits	Yes	On each data set, we perform a crossvalidation1... We used 20 folds for Strings and Sentiment, 10 for Cystic and Leukemia, 8 for Sorting and 6 for Mini Palindrome. For the programming data sets, the number of folds had to be reduced to ensure that each fold still contained a meaningful number of data points. For the Cystic and Leukemia data set, our ten folds were consistent with the paper of Gallicchio & Micheli (2013). In all cases, folds were generated such that the label distribution of the overall data set was maintained.
Hardware Specification	Yes	All experiments were performed on a consumer-grade laptop with an Intel Core i7-7700 HQ CPU.
Software Dependencies	No	For SVM, we utilized the LIBSVM standard implementation (Chang & Lin, 2011).
Experiment Setup	Yes	We optimized all hyper-parameters in a nested 5-fold crossvalidation, namely the number of prototypes K for MGLVQ and LVQ metric learning in the range [1, 15], the number of neighbors for KNN in the range [1, 15], the kernel bandwidth for SVM in the range [0.1, 10], the sparsity parameter λ for the goodness classiﬁer in the range [10 5, 10], and the regularization strength β for GESL and BEDL in the range 2 K m [10 6, 10 2].