Non-parametric classification via expand-and-sparsify representation
Authors: Kaushik Sinha
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations performed on real-world datasets corroborate our theoretical results. |
| Researcher Affiliation | Academia | Kaushik Sinha School of Computing Wichita State University Wichita, KS 67260 kaushik.sinha@wichita.edu |
| Pseudocode | Yes | Algorithm 1 Training set Dn = {(xi, yi)}n i=1 Sd 1 {0, 1}, Projection dimensionality m N, k m non-zeros in the Ea S representation, random seed R, and inference with test point x Sd 1. Train Ea SClassifier(Dn, m, k, R) Sample Θ with seed R Initialize w[i], ct[i] 0, i [m] for (x, y) S do eas h1(x) w[i] w[i] + y, i [m] : eas[i] = 1 ct[i] ct[i] + 1, i [m] : eas[i] = 1 end w[i] w[i]/ct[i], i [m] return Θ, w end Infer Ea SClassifier(x, Θ, k, w) eas h1(x) return I[(eas w)/k 1 |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | All the remaining datasets are taken from the Open ML repository3 Vanschoren et al. [2013]. Both mnist and fmnist (fashion-mnist for short) are 10 class classification problem with 784 features. We convert them to binary classification problems by using the label 3 and 5 for the mnist dataset and label 2 (Pullover) and 4 (Coat)for fmnist dataset. For efficiency purpose, the feature dimensions for both these datasets are reduced to 20 using principal component analysis (PCA). For the remaining six datasets, the task is that of binary classification. For all eight datasets, the features are normalized using Standard Scaler option in scikit-learn and are made to be unit norm. |
| Dataset Splits | Yes | For each dataset, we generate train and test set using scikit-learn s train_test_split method (80 : 20 split). For RF we use a grid search over the number of estimators (trees) from the set {250, 500, 750, 1000} and perform a 3 fold cross validation to choose the final model. |
| Hardware Specification | Yes | We run our experiments on a laptop with Intel Xeon W-10855M Processor, 64GB memory and NVDIA Quadro RTX 5000 Mobile GPU (with 16GB memory). |
| Software Dependencies | No | The paper mentions using 'scikit-learn' but does not specify any version numbers for this or other software components, which is necessary for reproducibility. |
| Experiment Setup | Yes | For k-NN, we used two values of k: k = 1 and 10. For RF we use a grid search over the number of estimators (trees) from the set {250, 500, 750, 1000} and perform a 3 fold cross validation to choose the final model. As per Theorem 3.12, we set k to be d log m. For all eight datasets, the features are normalized using Standard Scaler option in scikit-learn and are made to be unit norm. |