Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Evaluating the Robustness of Neural Networks Learned by Transduction

Authors: Jiefeng Chen, Xi Wu, Yang Guo, Yingyu Liang, Somesh Jha

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a systematic empirical study on various defenses. For RMC (Wu et al., 2020b), DENT (Wang et al., 2021), and URejectron (Goldwasser et al., 2020), we show that even weak instantiations of GMSA can break respective defenses.
Researcher Affiliation Collaboration 1 University of Wisconsin-Madison 2 Google 3 Xai Pient
Pseudocode Yes Algorithm 1 FIXED POINT ATTACK (FPA) Algorithm 2 GREEDY MODEL SPACE ATTACK (GMSA)
Open Source Code Yes Our code is available at: https://github.com/jfc43/eval-transductive-robustness.
Open Datasets Yes We use three datasets MNIST, CIFAR-10 and GTSRB in our experiments. The MNIST (Le Cun, 1998) is a large dataset of handwritten digits. The CIFAR-10 (Krizhevsky et al., 2009) is a dataset of 32x32 color images. The German Traf๏ฌc Sign Recognition Benchmark (GTSRB) (Stallkamp et al., 2012) is a dataset of color images.
Dataset Splits Yes MNIST: Each digit has 5,500 training images and 1,000 test images. CIFAR-10: ...each consisting of 5,000 training images and 1,000 test images. GTSRB: There are about 34,799 training images, 4,410 validation images and 12,630 test images. We randomly split the data into a training set Dtrain containing 63,000 images, a validation set Dval containing 7,000 images and a test set Dtest containing 30,000 images.
Hardware Specification Yes We run all experiments with Py Torch and NVIDIA Ge Force RTX 2080Ti GPUs.
Software Dependencies No The paper mentions using PyTorch but does not provide specific version numbers for PyTorch or any other software dependencies, such as Python or CUDA versions.
Experiment Setup Yes For both standard training and adversarial training, we train the model for 100 epochs using the Adam optimizer with a batch size of 128 and a learning rate of 10-3. We use the Lโˆž norm PGD attack as the adversary for adversarial training with a perturbation budget ฯต of 0.3, a step size of 0.01, and number of steps of 40.