reproducibilityindex.ai

Bounding Uncertainty for Active Batch Selection

Authors: Hanmo Wang, Runwu Zhou, Yi-Dong Shen5240-5247

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on ﬁfteen datasets indicate that our method has significantly higher classiﬁcation accuracy on testing data than the latest state-of-the-art BMAL methods, and also scales better even when the size of the unlabeled pool reaches 106.
Researcher Affiliation	Academia	1State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China 2University of Chinese Academy of Sciences, Beijing 100049, China {wanghm,zhourw,ydshen}@ios.ac.cn
Pseudocode	Yes	Algorithm 1 Rand Greedy(U,b) Algorithm 2 BMAL based on LBC
Open Source Code	No	No explicit statement or link is provided for the open-source code of the methodology described in the paper.
Open Datasets	Yes	We use ﬁfteen benchmark datasets, seven of which are from UCI machine learning repository (Dheeru and Karra Taniskidou 2017), namely segmentation, waveform, twonorm, HIGGS, covtype, SUSY and letter. The other eight datasets are Reuters, RCV1, TDT2, 20News, WEBACE, ORL, COIL20 and USPS, which are publicly available2. (footnote 2: http://www.cad.zju.edu.cn/home/dengcai/)
Dataset Splits	No	No specific validation dataset split information (percentages, counts, or predefined splits) is provided. The paper mentions splitting into unlabeled (60%) and testing (40%) data.
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments are mentioned in the paper.
Software Dependencies	No	The paper mentions 'Logistic Regression is used as the classiﬁer' and 'Gaussian kernel' and other methods, but no specific version numbers for any software components or libraries are provided.
Experiment Setup	Yes	The batch size b is ﬁxed to be 100 on large datasets covtype, SUSY and HIGGS, 50 on letter and 20News, and 10 on other small datasets. Logistic Regression is used as the classiﬁer. For each dataset, the experiment is conducted 10 times. The averaged result is reported. We use Gaussian kernel on all datasets. For data instances x and y we set K(x, y) = exp( \|\|x y\|\|2/p), where the parameter p is the median of all pair-wise squared Euclidean distances over the unlabeled data. We sort all the unlabeled samples increasingly according to their certainty in Eq. (10), and set hyper-parameter ϵ to be the β-th percentile (0 < β < 100). We empirically use two hyper-parameter γ and τ to describe β as β = γ (nu/n)τ, where γ and τ is ﬁxed to be 20 and 10 respectively. For hyper-parameter λ, we set λ = b2.