Near-Optimal Active Learning of Halfspaces via Query Synthesis in the Noisy Setting

Authors: Lin Chen, Hamed Hassani, Amin Karbasi

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical experiments demonstrate that DC runs orders of magnitude faster than the existing methods. In this section, we extensively evaluate the performance of DC against the following baselines: RANDOM-SAMPLING: Queries are generated by sampling uniformly at random from the unit sphere Sd 1. Our metrics to compare different algorithms are: a) estimation error, b) query complexity, and c) execution time.
Researcher Affiliation Academia Lin Chen,1,2 Hamed Hassani,3 Amin Karbasi1,2 1Department of Electrical Engineering, 2Yale Institute for Network Science, Yale University 3Computer Science Department, ETH Zürich {lin.chen, amin.karbasi}@yale.edu, hamed@inf.ethz.ch
Pseudocode Yes Algorithm 1 DC2 Input: orthonormal vectors e1,e2, estimation error at most ϵ, success probability at least 1 δ. Output: a unit vector ˆe which is an estimate for the normalized orthogonal projection of h onto span{e1,e2}. ... Algorithm 2 Dimension Coupling (DC) Input: an orthonormal basis E = {e1,e2,...,ed} of Rd. Output: a unit vector ˆh which is an estimate for h.
Open Source Code No The paper does not provide any specific links or statements about the availability of its own source code. It mentions 'the fastest available implementations in MATLAB' for baselines, but not for their proposed DC method.
Open Datasets No By nature, in active learning via query synthesis, all data points and queries are generated synthetically. For all the baselines, we used the fastest available implementations in MATLAB.
Dataset Splits No The paper evaluates performance based on 'Number of Queries' and 'Estimation Error' on synthetically generated data but does not specify traditional training, validation, or test dataset splits in terms of percentages or counts.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions 'MATLAB' was used for baselines but does not provide specific version numbers for MATLAB or any other software dependencies, libraries, or solvers relevant to their method.
Experiment Setup No The paper discusses algorithmic parameters like Tϵ,δ and noise level ρ, but it does not provide common machine learning experimental setup details such as learning rates, batch sizes, number of epochs, or optimizer settings for training models.