Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Evaluating the Robustness of Neural Networks Learned by Transduction
Authors: Jiefeng Chen, Xi Wu, Yang Guo, Yingyu Liang, Somesh Jha
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a systematic empirical study on various defenses. For RMC (Wu et al., 2020b), DENT (Wang et al., 2021), and URejectron (Goldwasser et al., 2020), we show that even weak instantiations of GMSA can break respective defenses. |
| Researcher Affiliation | Collaboration | 1 University of Wisconsin-Madison 2 Google 3 Xai Pient |
| Pseudocode | Yes | Algorithm 1 FIXED POINT ATTACK (FPA) Algorithm 2 GREEDY MODEL SPACE ATTACK (GMSA) |
| Open Source Code | Yes | Our code is available at: https://github.com/jfc43/eval-transductive-robustness. |
| Open Datasets | Yes | We use three datasets MNIST, CIFAR-10 and GTSRB in our experiments. The MNIST (Le Cun, 1998) is a large dataset of handwritten digits. The CIFAR-10 (Krizhevsky et al., 2009) is a dataset of 32x32 color images. The German Traf๏ฌc Sign Recognition Benchmark (GTSRB) (Stallkamp et al., 2012) is a dataset of color images. |
| Dataset Splits | Yes | MNIST: Each digit has 5,500 training images and 1,000 test images. CIFAR-10: ...each consisting of 5,000 training images and 1,000 test images. GTSRB: There are about 34,799 training images, 4,410 validation images and 12,630 test images. We randomly split the data into a training set Dtrain containing 63,000 images, a validation set Dval containing 7,000 images and a test set Dtest containing 30,000 images. |
| Hardware Specification | Yes | We run all experiments with Py Torch and NVIDIA Ge Force RTX 2080Ti GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch but does not provide specific version numbers for PyTorch or any other software dependencies, such as Python or CUDA versions. |
| Experiment Setup | Yes | For both standard training and adversarial training, we train the model for 100 epochs using the Adam optimizer with a batch size of 128 and a learning rate of 10-3. We use the Lโ norm PGD attack as the adversary for adversarial training with a perturbation budget ฯต of 0.3, a step size of 0.01, and number of steps of 40. |