RNNRepair: Automatic RNN Repair via Model-based Analysis
Authors: Xiaofei Xie, Wenbo Guo, Lei Ma, Wei Le, Jian Wang, Lingjun Zhou, Yang Liu, Xinyu Xing
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation shows that the proposed influence model is able to extract accurate and understandable features. Based on the influence model, our proposed technique could effectively infer the influential instances from not only an entire testing sequence but also a segment within that sequence. Moreover, with the sample-level and segmentlevel influence relations, RNNRepair could further remediate two types of incorrect predictions at the sample level and segment level. In our experiments, we evaluated the correctness of the temporal features (Sec 4.1), the effectiveness of our influence analysis (Sec. 4.2) and the effectiveness of the repair (Sec. 4.3). More evaluation can be found in the supplementary material. Datasets and Models. We selected two widely-used public datasets (i.e., MNIST, and Toxic) to evaluate the influence analysis. |
| Researcher Affiliation | Academia | 1Nanyang Technological University, Singapore 2Kyushu University, Japan 3College of Information Sciences and Technology, The Pennsylvania State University, State College, PA, USA 4University of Alberta, Canada 5Alberta Machine Intelligence Institute, Canada 6Iowa State University, USA 7Tianjin University, China. |
| Pseudocode | No | The paper describes its approach procedurally in text (e.g., in Section 3.3.1 and 3.3.2) but does not include any formal pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | 1https://bitbucket.org/xiaofeixie/rnnrepair |
| Open Datasets | Yes | We selected two widely-used public datasets (i.e., MNIST, and Toxic) to evaluate the influence analysis. MNIST (Le Cun & Cortes, 1998) is selected for evaluating the sample-level influence analysis by comparing it with the existing baselines. Toxic Comment Dataset (abbrev. Toxic) 2 is selected for evaluating the segment-level influence analysis. In addition, we introduce another dataset Standard Sentiment Treebank (SST) (Socher et al., 2013) for the segment-level repair and a LSTM network with hidden size 300 is trained. |
| Dataset Splits | No | The paper mentions using training and test sets but does not explicitly detail a separate validation set split or its size and composition in the experimental setup. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments, such as GPU or CPU models. It only mentions training networks without details on the computing environment. |
| Software Dependencies | No | The paper mentions techniques and models like LSTM, GRU, GMM, and linear classifier but does not provide specific version numbers for any software dependencies or libraries used in their implementation. |
| Experiment Setup | Yes | We train an LSTM network with hidden size 100 for this task. ...We train a GRU network with hidden size 300. In addition, we introduce another dataset Standard Sentiment Treebank (SST) ...a LSTM network with hidden size 300 is trained. Finally, we used the augmented training data for training with 40 epochs (the same with the original model). For each test case, we set the parameter γ (refer to Section 3.3.2) as 5. we select 5, 15, 25, 35, 45 training samples (i.e., m in Section 3.3.2) for the augmentation, respectively. |