reproducibilityindex.ai

Oversampling for Imbalanced Data via Optimal Transport

Authors: Yuguang Yan, Mingkui Tan, Yanwu Xu, Jiezhang Cao, Michael Ng, Huaqing Min, Qingyao Wu5605-5612

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on toy and real-world data sets demonstrate the efﬁcacy of our proposed method in terms of multiple metrics.
Researcher Affiliation	Collaboration	Yuguang Yan,1,2 Mingkui Tan,1 Yanwu Xu,3 Jiezhang Cao,1 Michael Ng,4 Huaqing Min,1 Qingyao Wu1 1School of Software Engineering, South China University of Technology, China 2CVTE Research, Guangzhou Shiyuan Electronics Co., Ltd., China 3Artiﬁcial Intelligence Innovation Business, Baidu Inc., China 4Department of Mathematics, Hong Kong Baptist University, Hong Kong, China
Pseudocode	Yes	Algorithm 1 Optimal Transport for Over Sampling (OTOS)
Open Source Code	No	The paper does not provide a direct link to source code or explicitly state that the code is publicly released.
Open Datasets	Yes	We adopt six benchmark data sets from LIBSVM (australian, breast-cancer, diabetes, german, svmguide2, and svmguide4) in the experiments. For the data sets that are split into training and testing subsets, we only adopt training subsets for simplicity. For the multi-class data sets, we take one class as positive and the others as negative to construct imbalanced binary classiﬁcation tasks. [...] We also use four fundus image data sets, among which i See-AMD, i See-DR, i See-glaucoma are used to detect Age-Related Macular Degeneration (AMD), Diabetic Retinopathy (DR) and glaucoma, respectively, and ORIGA is used to detect glaucoma.
Dataset Splits	Yes	results of each time are obtained by the mean of 10-fold cross-validation.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments. It does mention extracting features using a ResNet-152, but not the hardware that ran the training/inference.
Software Dependencies	No	The paper states using 'linear SVM' and drawing random samples from 'uniform distribution U(0, 1)' but does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup	Yes	For all the compared methods, we synthesize minority class samples until that the numbers of minority and majority class samples are the same, and use linear SVM with the default parameter C = 1 as the classiﬁer. For our method, we draw random samples from a prior uniform distribution U(0, 1). The parameters λ and ϵ are selected in the range 10{−1,0,1,2,3,4,5}, and the best results are adopted. We repeat all the experiments 10 times and report the mean and standard derivation values, and results of each time are obtained by the mean of 10-fold cross-validation.