reproducibilityindex.ai

Text Classification with Born's Rule

Authors: Emanuele Guidotti, Alfio Ferrara

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through analysis of three benchmark datasets, we illustrate several aspects of the proposed method, such as classiﬁcation performance, explainability, and computational efﬁciency. These ideas are also applicable to non-textual data. Section 6 presents our empirical results.
Researcher Affiliation	Academia	Emanuele Guidotti Institute of Financial Analysis University of Neuchâtel, Switzerland emanuele.guidotti@unine.ch Alﬁo Ferrara Department of Computer Science and Data Science Research Center University of Milan, Italy alﬁo.ferrara@unimi.it
Pseudocode	No	The paper describes the algorithms and models using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode blocks or algorithm figures.
Open Source Code	Yes	All the code is available at https://github.com/eguidotti/bornrule.
Open Datasets	Yes	We illustrate several aspects of our classiﬁer using three well-established text classiﬁcation benchmarks: 20Newsgroup, and the R8 and R52 subsets of Reuters 21578. The ﬁnal datasets are composed by (20Newsgroup) 20 classes, 204 817 words, 11 314 training documents, and 7 532 test documents; (R8) 8 classes, 33 593 words, 5 485 training documents, 2 189 test documents; (R52) 52 classes, 38 132 words, 6 532 training documents, and 2 568 test documents.
Dataset Splits	Yes	The ﬁnal datasets are composed by (20Newsgroup) 20 classes, 204 817 words, 11 314 training documents, and 7 532 test documents; (R8) 8 classes, 33 593 words, 5 485 training documents, 2 189 test documents; (R52) 52 classes, 38 132 words, 6 532 training documents, and 2 568 test documents. In Table 1, we tune the baseline classiﬁers via grid-search using 5-fold cross-validation.
Hardware Specification	Yes	All the results are obtained using Python 3.9 on a Google Cloud Virtual Machine equipped with Cent OS 7, 12 v CPU Intel Cascade Lake 85 GB RAM, 1 GPU NVIDIA Tesla A100, and CUDA 11.5.
Software Dependencies	Yes	All the results are obtained using Python 3.9 on a Google Cloud Virtual Machine equipped with Cent OS 7, 12 v CPU Intel Cascade Lake 85 GB RAM, 1 GPU NVIDIA Tesla A100, and CUDA 11.5. To simplify extensions to this work, we implement our classiﬁcation algorithm in scikit-learn [17] and we embed the classiﬁer in a neural network architecture using pytorch [16].
Experiment Setup	Yes	All the classiﬁers are executed on CPU with default parameters. In Table 1, we tune the baseline classiﬁers via grid-search using 5-fold cross-validation. Except for BC, which needs no tuning, all the other classiﬁers use between 20 and 50 combinations of hyper-parameters (reported in the replication code).