reproducibilityindex.ai

Instance Selection: A Bayesian Decision Theory Perspective

Authors: Qingqiang Chen, Fuyuan Cao, Ying Xing, Jiye Liang6287-6294

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performance of our method is studied on extensive synthetic and benchmark data sets. Experiments To properly examine the performance of BDIS, we employ the random sampling (RS), which is one of the most classic and commonly used instance selection methods, RIS, and EGDIS as baseline methods. The comparisons are carried out on multiple synthetic and 12 benchmark data sets which are available at the UCI Repository (Dua and Graff 2017). Table 2: Comparisons of classiﬁcation accuracy (A) and reduction rate (R).
Researcher Affiliation	Academia	Qingqiang Chen, Fuyuan Cao, Ying Xing, Jiye Liang School of Computer and Information Technology, Shanxi University, Taiyuan 030006, P.R. China chenqq18@126.com, cfy@sxu.edu.cn, sxxying@126.com, ljy@sxu.edu.cn
Pseudocode	Yes	Algorithm 1: BDIS 1: Input: Training set Dtr = {(x1, y1) . . . , (xn, yn)}. 2: Parameters: Truncation threshold τ1 and τ2. 3: Output: Reduced set, R. 4: Employ accelerated k-means algorithm to cluster Dtr into two sub-clusters; 5: for each sub-cluster do 6: if the labels of data in the sub-cluster are same then 7: We consider the sub-cluster is a LHC and record the data within it and its cluster center; 8: else 9: Iteratively divide the sub-cluster until it is composed of one or more LHCs. 10: end if 11: end for 12: In each class of data, the LHCs with the number of instances between τ1 and τ2 are selected, and the instances closest to the center of these LHCs are added to R.
Open Source Code	Yes	All code and data results are available at https://github.com/CQQXY161120/Instance-Selection.
Open Datasets	Yes	The comparisons are carried out on multiple synthetic and 12 benchmark data sets which are available at the UCI Repository (Dua and Graff 2017).
Dataset Splits	Yes	All experimental results are obtained through 10-fold cross-validation.
Hardware Specification	Yes	The experiments are conducted on an Intel i77700 CPU@3.60HZ and 48G RAM.
Software Dependencies	No	No specific software dependencies with version numbers are provided. The paper mentions "FAISS (Johnson, Douze, and J egou 2017) to accelerate k-means clustering" but does not specify a version number for FAISS.
Experiment Setup	Yes	Therefore, in order to balance the amount of instances and the generalization performance of classiﬁer, we empirically set k1 = 0 and k2 = 7 for subsequent experiments.