reproducibilityindex.ai

Efficient Approximation Algorithms for Strings Kernel Based Sequence Classification

Authors: Muhammad Farhan, Juvaria Tariq, Arif Zaman, Mudassir Shabbir, Imdad Ullah Khan

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our algorithm achieves excellent approximation performance with theoretical guarantees. In the process we solve an open combinatorial problem, which was posed as a major hindrance to the scalability of existing solutions. We give analytical bounds on quality and runtime of our algorithm and report its empirical performance on real world biological and music sequences datasets.
Researcher Affiliation	Academia	Muhammad Farhan Department of Computer Science School of Science and Engineering Lahore University of Management Sciences Lahore, Pakistan 14030031@lums.edu.pk Juvaria Tariq Department of Mathematics School of Science and Engineering Lahore University of Management Sciences Lahore, Pakistan jtariq@emory.edu Arif Zaman Department of Computer Science School of Science and Engineering Lahore University of Management Sciences Lahore, Pakistan arifz@lums.edu.pk Mudassir Shabbir Department of Computer Science Information Technology University Lahore, Pakistan mudassir.shabbir@itu.edu.pk Imdad Ullah Khan Department of Computer Science School of Science and Engineering Lahore University of Management Sciences Lahore, Pakistan imdad.khan@lums.edu.pk
Pseudocode	Yes	Algorithm 1 : Approximate-Kernel(SX,SY ,k,m,ϵ,δ,B)
Open Source Code	Yes	Sources for datasets and source code are available at 1. https://github.com/mufarhan/sequence_class_NIPS_2017
Open Datasets	Yes	Table 1: Datasets description Name Task Classes Seq. Av.Len. Evaluation Ding-Dubchak [6] protein fold recognition 27 694 169 10-fold CV SCOP [4, 31] protein homology detection 54 7329 308 54 binary class. Music [21, 26] music genre recognition 10 1000 2368 5-fold CV Artist20 [8, 17] artist identiﬁcation 20 1413 9854 6-fold CV ISMIR [17] music genre recognition 6 729 10137 5-fold CV
Dataset Splits	Yes	Table 1: Datasets description Name Task Classes Seq. Av.Len. Evaluation Ding-Dubchak [6] protein fold recognition 27 694 169 10-fold CV SCOP [4, 31] protein homology detection 54 7329 308 54 binary class. Music [21, 26] music genre recognition 10 1000 2368 5-fold CV Artist20 [8, 17] artist identiﬁcation 20 1413 9854 6-fold CV ISMIR [17] music genre recognition 6 729 10137 5-fold CV
Hardware Specification	Yes	These experiments are performed on an Intel Xeon machine with (8 Cores, 2.1 GHz and 32 GB RAM) using the same experimental settings as in [13, 15, 17]. Since our algorithm is applicable for signiﬁcantly wider range of k and m, we also report classiﬁcation performance with large k and m. For our algorithm we used B {300, 500} and σ {0.25, 0.5} with no signiﬁcant difference in results as implied by the theoretical analysis. In all reported results B = 300 and σ = 0.5. In order to perform comparisons, for a few combinations of parameters we generated exact kernel matrices of each dataset on a much more powerful machine (a cluster of 20 nodes, each having 24 CPU s with 2.5 GHz speed and 128GB RAM).
Software Dependencies	No	The paper mentions using 'the publicly available SVM implementation LIBSVM [2]' but does not provide a specific version number for it or any other software dependencies.
Experiment Setup	Yes	For our algorithm we used B {300, 500} and σ {0.25, 0.5} with no signiﬁcant difference in results as implied by the theoretical analysis. In all reported results B = 300 and σ = 0.5. [...] The parameters used for reporting classiﬁcation performance are chosen in order to maintain comparability with previous studies. Similarly all measurements are made as in [13, 14], for instance for music genre classiﬁcation we report results of 10-fold cross-validation (see Table 1). For our algorithm we used B = 300 and σ = 0.5 and we take an average of performances over three independent runs.