Querying Partially Labelled Data to Improve a K-nn Classifier

Authors: Vu-Linh Nguyen, SŽbastien Destercke, Marie-Helene Masson

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, some experiments in Section show the effectiveness of our proposals. and This section presents the experimental setup and the results obtained with benchmark data sets which are used to illustrate the behaviour of the proposed schemes.
Researcher Affiliation Academia Vu-Linh Nguyen, S ebastien Destercke, Marie-H el ene Masson UMR CNRS 7253 Heudiasyc, Sorbonne Universit e, Universit e de Technologie de Compi egne CS 60319 60203 Compi egne cedex, France {linh.nguyen, sebastien.destercke, mylene.masson}@hds.utc.fr
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the described methodology.
Open Datasets Yes Results have been obtained for 15 UCI data sets described in Table 6. Three different values for K (3, 6 and 9) have been used for all experiments.
Dataset Splits Yes We use a three-fold cross-validation procedure: each data set is randomly split into 3 folds. Each fold is in turn considered as the test set, the other folds are used for the training set.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes Three different values for K (3, 6 and 9) have been used for all experiments. The weight wt k for an instance t is wt k = 1 (dt k)/( K j=1 dt j) with dt j the Euclidean distance between xt j and t. As usual when working with Euclidean distance based K-nn, data is normalized. ... The training set is contaminated according to one of the models with two combinations of (p, q) parameters:(p = 0.7, q = 0.5) and (p = 0.9, q = 0.9)... For each data set, the number of queries I has been fixed to 10% of the number of training data.