Designing Robust Transformers using Robust Kernel Density Estimation
Authors: Xing Han, Tongzheng Ren, Tan Nguyen, Khai Nguyen, Joydeep Ghosh, Nhat Ho
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide empirical validation of the benefits of integrating our proposed robust KDE attention mechanisms (Transformer-RKDE/SPKDE/Mo M) into Transformer base models. We compare these with the standard softmax Transformer across multiple datasets representing different modalities. These include language modeling on the Wiki Text-103 dataset (Merity et al., 2016) (Section 4.1) and image classification on Image Net (Russakovsky et al., 2015; Deng et al., 2009). |
| Researcher Affiliation | Academia | Xing Han Department of ECE University of Texas at Austin aaronhan223@utexas.edu Tongzheng Ren Department of Computer Science University of Texas at Austin tongzheng@utexas.edu Tan Minh Nguyen Department of Mathematics University of California, Los Angeles tanmnguyen89@ucla.edu Khai Nguyen Department of Statistics and Data Sciences University of Texas at Austin khainb@utexas.edu Joydeep Ghosh Department of ECE University of Texas at Austin jghosh@utexas.edu Nhat Ho Department of Statistics and Data Sciences University of Texas at Austin minhnhat@utexas.edu |
| Pseudocode | Yes | Algorithm 1 Procedure of Computing Attention Vector of Transformer-RKDE/SPKDE/Mo M |
| Open Source Code | No | No explicit statement or link was found where the authors provide their own source code for the methodology described in this paper. The only code link found ('Implementation available at github.com/QData/TextAttack') refers to a third-party tool used for an attack method. |
| Open Datasets | Yes | These include language modeling on the Wiki Text-103 dataset (Merity et al., 2016) (Section 4.1) and image classification on Image Net (Russakovsky et al., 2015; Deng et al., 2009). Furthermore, we assess performance across multiple robustness benchmarks, namely Image Net-C (Hendrycks & Dietterich, 2019), Image Net-A (Hendrycks et al., 2021b), Image Net-O (Hendrycks et al., 2021b), Image Net-R (Hendrycks et al., 2021a), and Image Net-Sketch (Wang et al., 2019) (Section 4.2), as well as UEA time-series classification (Section 4.3). |
| Dataset Splits | Yes | Table 1 presents the validation and test perplexity (PPL) for several methods. The validation set and test sets consist of 60 articles with 218K and 246K tokens respectively. |
| Hardware Specification | Yes | All experiments were conducted on machines with 4 NVIDIA A-100 GPUs. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) were explicitly mentioned in the paper. |
| Experiment Setup | Yes | We configured the dimensions of key, value, and query to 128, and set the training and evaluation context length to 256. For self-attention, we allocated 8 heads for our methods and Performer, and 4 for Transformer-MGK. The dimension of the feedforward layer was set to 2048, with the number of layers established at 16. ... Each attack distorts the input image with a perturbation budget ϵ = 1/255 under l∞ norm, while the PGD attack uses 20 steps with a step size of α = 0.15. |