Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Interpretable Adversarial Perturbation in Input Embedding Space for Text
Authors: Motoki Sato, Jun Suzuki, Hiroyuki Shindo, Yuji Matsumoto
IJCAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted our experiments on a sentiment classification (SEC) task, a category classification (CAC) task, and a grammatical error detection (GED) task to evaluate the effectiveness of our methods, i Adv T-Text and i VAT-Text. |
| Researcher Affiliation | Collaboration | 1Preferred Networks, Inc., 2NTT Communication Science Laboratories, 3Nara Institute of Science and Technology, 4RIKEN Center for Advanced Intelligence Project |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code for reproducing our experiments is available at https: //github.com/aonotas/interpretable-adv |
| Open Datasets | Yes | For SEC, we used the following well-studied benchmark datasets, IMDB [Maas et al., 2011], Elec [Johnson and Zhang, 2015], and Rotten Tomatoes [Pang and Lee, 2005]. For CAC, we utilized DBpedia [Lehmann et al., 2015] and RCV1 [Lewis et al., 2004]. For GED, we utilized the First Certificate in the English dataset (FCE-public) [Yannakoudakis et al., 2011]. |
| Dataset Splits | Yes | Table 1: Summary of datasets and Following [Miyato et al., 2017], we split the original training data into training and development sentences. We utilized an early stopping criterion [Caruana et al., 2000] based on the performance measured on development sets. |
| Hardware Specification | No | The paper only mentions 'with GPU support' but does not specify any particular GPU model, CPU, or detailed hardware specifications used for experiments. |
| Software Dependencies | No | The paper states 'using Chainer [Tokui et al., 2015]', but does not provide a version number for Chainer or any other software dependencies. |
| Experiment Setup | Yes | The hyper-parameters are summarized in Table 2, with dropout [Srivastava et al., 2014] and Adam [Kingma and Ba, 2014]. In addition, we set ϵ = 5.0 for both Adv T-Text and VAT-Text and ϵ = 15.0 for our method. We also set λ = 1 for all the methods. We utilized an early stopping criterion [Caruana et al., 2000] based on the performance measured on development sets. |