Adversarial Label Learning

Authors: Chidubem Arachie, Bert Huang3183-3190

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test adversarial label learning on a variety of datasets, comparing it with other approaches for weak supervision. In this section, we describe how we simulate domain expertise to generate weak supervision signals. We then describe the datasets we evaluated with and the compared weak supervision approaches, and we analyze the results of the experiments. Table 1 shows the mean accuracies obtained by running ALL on the different datasets.
Researcher Affiliation Academia Chidubem Arachie Department of Computer Science Virginia Tech achid17@vt.edu Bert Huang Department of Computer Science Virginia Tech bhuang@vt.edu
Pseudocode Yes Algorithm 1 Adversarial Label Learning Require: Dataset X = [x1, . . . , xn], learning rate schedule α, weak signals and bounds [(q1, b1), . . . , (qm, bm)], augmented Lagrangian parameter ρ. 1: Initialize θ (e.g., random, zeros, etc.) 2: Initialize ˆy [0, 1]n (e.g., average of q1, . . . , qm) 3: Initialize γ Rm 0 (e.g., zeros) 4: while not converged do 5: Update θ with Equation (6) 6: Update p with model and θ 7: Update ˆy with Equation (7) 8: Update γ with Equation (8) 9: end while 10: return model parameters θ
Open Source Code No The paper does not provide any concrete access information (e.g., URL to a repository) for open-source code related to the described methodology.
Open Datasets Yes Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), Breast Cancer (Blake and Merz 1998; Street, Wolberg, and Mangasarian 1993), OBS Network (Rajab et al. 2016), Cardiotocography (Ayres-de Campos et al. 2000), Clave Direction (Vurkac 2011), Credit Card (Blake and Merz 1998), Statlog Satellite (Blake and Merz 1998), Phishing Websites (Mohammad, Thabtah, and Mc Cluskey 2012), Wine Quality (Cortez et al. 2009).
Dataset Splits Yes We randomly split each dataset such that 30% is used as weak supervision data, 40% is used as training data, and 30% is used as test data. For our experiments, we use 10 such random splits and report the mean of the results.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We use the sigmoid function as our parameterized function fθ for estimating class probabilities of ALL and GE, i.e., [fθ(xj)]n j=1 = 1/(1 + exp( θT x)) = pθ. We regularize this objective with an L2 penalty. The update step for the parameters is (1 2ˆy) , where p θ is the Jacobian matrix for the classifier f over the full dataset and αt is a gradient step size that can decrease over time. The update for each KKT multiplier is γi γi ρ q i (1 ˆy) + (1 qi) ˆy nbi , which is clipped to be non-negative and uses a fixed step size ρ as dictated by the augmented Lagrangian method (Hestenes 1969).