Convex Batch Mode Active Sampling via α-Relative Pearson Divergence

Authors: Hanmo Wang, Liang Du, Peng Zhou, Lei Shi, Yi-Dong Shen

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical studies on UCI datasets demonstrate the effectiveness of the proposed approach compared with the state-of-the-art batch mode active learning methods.
Researcher Affiliation Academia 1State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China
Pseudocode Yes Algorithm 1 Algorithm of RPEactive Input: parameters α,λ; kernel matrix K; constants nu, nl, ns Output: indicator variable β 1: compute θ(0) according to (26) 2: ˆθ θ(0) 3: k 0 4: while not converge do 5: compute ˆβ according to (21) 6: compute g(θ(k)) according to (24) 7: update θ(k+1) according to (25) 8: k k + 1 9: if G(θ(k)) < G(ˆθ) then 10: ˆθ θ(k) 11: end if 12: end while 13: compute φ according to (20) with θ = ˆθ 14: compute β according to (21)
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes In our experiment, we evaluate the performance of our proposed RPEactive algorithm on 6 datasets from the UCI repository, namely iris, australian, sonar, heart, wine and arcene.
Dataset Splits No The paper mentions 'We randomly divide each dataset into unlabeled set (60%) and testing set (40%)' but does not specify a separate validation dataset split.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper mentions 'Support Vector Machines is used as classification model' and 'We use Gaussian kernel' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For a fixed batch size ns, each method selects data samples for labeling at each iteration. The batch size ns is set to 5 in dateset iris and arcene due to their small sizes, and 10 in other datasets. The experiment is repeated 20 times and the average result is reported. Support Vector Machines is used as classification model to evaluate the performance of the labeled instances. Parameters α and λ are chosen from {0, 0.05, ..., 0.95} and {10 5, 10 4, ..., 1} respectively. We use Gaussian kernel for all datasets where the kernel width is searched in a relative large range. All the parameters are selected using greedy search method which searches all combinations of parameters and the one with the best average accuracy on test data(40%) is chosen.