Non-parametric classification via expand-and-sparsify representation

Authors: Kaushik Sinha

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations performed on real-world datasets corroborate our theoretical results.
Researcher Affiliation Academia Kaushik Sinha School of Computing Wichita State University Wichita, KS 67260 kaushik.sinha@wichita.edu
Pseudocode Yes Algorithm 1 Training set Dn = {(xi, yi)}n i=1 Sd 1 {0, 1}, Projection dimensionality m N, k m non-zeros in the Ea S representation, random seed R, and inference with test point x Sd 1. Train Ea SClassifier(Dn, m, k, R) Sample Θ with seed R Initialize w[i], ct[i] 0, i [m] for (x, y) S do eas h1(x) w[i] w[i] + y, i [m] : eas[i] = 1 ct[i] ct[i] + 1, i [m] : eas[i] = 1 end w[i] w[i]/ct[i], i [m] return Θ, w end Infer Ea SClassifier(x, Θ, k, w) eas h1(x) return I[(eas w)/k 1
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes All the remaining datasets are taken from the Open ML repository3 Vanschoren et al. [2013]. Both mnist and fmnist (fashion-mnist for short) are 10 class classification problem with 784 features. We convert them to binary classification problems by using the label 3 and 5 for the mnist dataset and label 2 (Pullover) and 4 (Coat)for fmnist dataset. For efficiency purpose, the feature dimensions for both these datasets are reduced to 20 using principal component analysis (PCA). For the remaining six datasets, the task is that of binary classification. For all eight datasets, the features are normalized using Standard Scaler option in scikit-learn and are made to be unit norm.
Dataset Splits Yes For each dataset, we generate train and test set using scikit-learn s train_test_split method (80 : 20 split). For RF we use a grid search over the number of estimators (trees) from the set {250, 500, 750, 1000} and perform a 3 fold cross validation to choose the final model.
Hardware Specification Yes We run our experiments on a laptop with Intel Xeon W-10855M Processor, 64GB memory and NVDIA Quadro RTX 5000 Mobile GPU (with 16GB memory).
Software Dependencies No The paper mentions using 'scikit-learn' but does not specify any version numbers for this or other software components, which is necessary for reproducibility.
Experiment Setup Yes For k-NN, we used two values of k: k = 1 and 10. For RF we use a grid search over the number of estimators (trees) from the set {250, 500, 750, 1000} and perform a 3 fold cross validation to choose the final model. As per Theorem 3.12, we set k to be d log m. For all eight datasets, the features are normalized using Standard Scaler option in scikit-learn and are made to be unit norm.