reproducibilityindex.ai

ActiveThief: Model Extraction Using Active Learning and Unannotated Public Data

Authors: Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, Vinod Ganapathy865-872

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental setup Datasets Secret datasets. For image classiﬁcation, we use the following datasets: MNIST (Le Cun et al. 1998), CIFAR-10 (Krizhevsky and Hinton 2009) and GTSRB (Stallkamp et al. 2012). For text classiﬁcation, we use MR (Pang and Lee 2005), IMDB (Maas et al. 2011), and AG News2. Further details are presented in the supplement3. ... Table 1: The agreement (%) on the secret test set for image and text classiﬁcation tasks. Each row corresponds to a subset selection strategy, while each column corresponds to a query budget (% of total dataset indicated in parenthesis).
Researcher Affiliation	Collaboration	Soham Pal,1 Yash Gupta,1 Aditya Shukla,1 Aditya Kanade,1,2 Shirish Shevade,1 Vinod Ganapathy1 1Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India 2Google Brain, USA
Pseudocode	No	The paper describes the ACTIVETHIEF framework using numbered steps and a flowchart in Figure 2, but it does not present structured pseudocode or algorithm blocks.
Open Source Code	No	The source code for ACTIVETHIEF will be made available at http://iisc-seal.net/ under an open source license.
Open Datasets	Yes	For image classiﬁcation, we use the following datasets: MNIST (Le Cun et al. 1998), CIFAR-10 (Krizhevsky and Hinton 2009) and GTSRB (Stallkamp et al. 2012). ... we use a downsampled and unannotated subset of the training fold of the ILSVRC2012-14 dataset (Chrabaszcz, Loshchilov, and Hutter 2017) as a proxy for public image data. ... For text, we use Wiki Text-2 (Merity et al. 2017).
Dataset Splits	Yes	Our training and validation splits are of size 100K and 20K respectively. ... We set aside 20% of the query budget for validation, and use 10% as the initial seed samples.
Hardware Specification	No	The paper only vaguely mentions 'We thank NVIDIA for computational resources' without providing specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions using 'Adam optimizer (Kingma and Ba 2015)' but does not provide specific version numbers for any key software components or libraries (e.g., programming language, deep learning frameworks, or other packages).
Experiment Setup	Yes	We use the Adam optimizer (Kingma and Ba 2015) with default hyperparameters. In our experiments, for all but the random strategy, training is done iteratively. In each iteration, the model is trained for at most 1,000 epochs with a batch size of 150 (images) or 50 (text). Early stopping is used with a patience of 100 epochs (images) or 20 epochs (text). An L2 regularizer is applied at a rate of 0.001, and dropout is applied at a rate of 0.1 for all datasets other than CIFAR-10, where a dropout of 0.2 is used.