Text Classification with Born's Rule
Authors: Emanuele Guidotti, Alfio Ferrara
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through analysis of three benchmark datasets, we illustrate several aspects of the proposed method, such as classification performance, explainability, and computational efficiency. These ideas are also applicable to non-textual data. Section 6 presents our empirical results. |
| Researcher Affiliation | Academia | Emanuele Guidotti Institute of Financial Analysis University of Neuchâtel, Switzerland emanuele.guidotti@unine.ch Alfio Ferrara Department of Computer Science and Data Science Research Center University of Milan, Italy alfio.ferrara@unimi.it |
| Pseudocode | No | The paper describes the algorithms and models using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode blocks or algorithm figures. |
| Open Source Code | Yes | All the code is available at https://github.com/eguidotti/bornrule. |
| Open Datasets | Yes | We illustrate several aspects of our classifier using three well-established text classification benchmarks: 20Newsgroup, and the R8 and R52 subsets of Reuters 21578. The final datasets are composed by (20Newsgroup) 20 classes, 204 817 words, 11 314 training documents, and 7 532 test documents; (R8) 8 classes, 33 593 words, 5 485 training documents, 2 189 test documents; (R52) 52 classes, 38 132 words, 6 532 training documents, and 2 568 test documents. |
| Dataset Splits | Yes | The final datasets are composed by (20Newsgroup) 20 classes, 204 817 words, 11 314 training documents, and 7 532 test documents; (R8) 8 classes, 33 593 words, 5 485 training documents, 2 189 test documents; (R52) 52 classes, 38 132 words, 6 532 training documents, and 2 568 test documents. In Table 1, we tune the baseline classifiers via grid-search using 5-fold cross-validation. |
| Hardware Specification | Yes | All the results are obtained using Python 3.9 on a Google Cloud Virtual Machine equipped with Cent OS 7, 12 v CPU Intel Cascade Lake 85 GB RAM, 1 GPU NVIDIA Tesla A100, and CUDA 11.5. |
| Software Dependencies | Yes | All the results are obtained using Python 3.9 on a Google Cloud Virtual Machine equipped with Cent OS 7, 12 v CPU Intel Cascade Lake 85 GB RAM, 1 GPU NVIDIA Tesla A100, and CUDA 11.5. To simplify extensions to this work, we implement our classification algorithm in scikit-learn [17] and we embed the classifier in a neural network architecture using pytorch [16]. |
| Experiment Setup | Yes | All the classifiers are executed on CPU with default parameters. In Table 1, we tune the baseline classifiers via grid-search using 5-fold cross-validation. Except for BC, which needs no tuning, all the other classifiers use between 20 and 50 combinations of hyper-parameters (reported in the replication code). |