reproducibilityindex.ai

RNNRepair: Automatic RNN Repair via Model-based Analysis

Authors: Xiaofei Xie, Wenbo Guo, Lei Ma, Wei Le, Jian Wang, Lingjun Zhou, Yang Liu, Xinyu Xing

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluation shows that the proposed inﬂuence model is able to extract accurate and understandable features. Based on the inﬂuence model, our proposed technique could effectively infer the inﬂuential instances from not only an entire testing sequence but also a segment within that sequence. Moreover, with the sample-level and segmentlevel inﬂuence relations, RNNRepair could further remediate two types of incorrect predictions at the sample level and segment level. In our experiments, we evaluated the correctness of the temporal features (Sec 4.1), the effectiveness of our inﬂuence analysis (Sec. 4.2) and the effectiveness of the repair (Sec. 4.3). More evaluation can be found in the supplementary material. Datasets and Models. We selected two widely-used public datasets (i.e., MNIST, and Toxic) to evaluate the inﬂuence analysis.
Researcher Affiliation	Academia	1Nanyang Technological University, Singapore 2Kyushu University, Japan 3College of Information Sciences and Technology, The Pennsylvania State University, State College, PA, USA 4University of Alberta, Canada 5Alberta Machine Intelligence Institute, Canada 6Iowa State University, USA 7Tianjin University, China.
Pseudocode	No	The paper describes its approach procedurally in text (e.g., in Section 3.3.1 and 3.3.2) but does not include any formal pseudocode blocks or algorithm listings.
Open Source Code	Yes	1https://bitbucket.org/xiaofeixie/rnnrepair
Open Datasets	Yes	We selected two widely-used public datasets (i.e., MNIST, and Toxic) to evaluate the inﬂuence analysis. MNIST (Le Cun & Cortes, 1998) is selected for evaluating the sample-level inﬂuence analysis by comparing it with the existing baselines. Toxic Comment Dataset (abbrev. Toxic) 2 is selected for evaluating the segment-level inﬂuence analysis. In addition, we introduce another dataset Standard Sentiment Treebank (SST) (Socher et al., 2013) for the segment-level repair and a LSTM network with hidden size 300 is trained.
Dataset Splits	No	The paper mentions using training and test sets but does not explicitly detail a separate validation set split or its size and composition in the experimental setup.
Hardware Specification	No	The paper does not specify the hardware used for experiments, such as GPU or CPU models. It only mentions training networks without details on the computing environment.
Software Dependencies	No	The paper mentions techniques and models like LSTM, GRU, GMM, and linear classiﬁer but does not provide specific version numbers for any software dependencies or libraries used in their implementation.
Experiment Setup	Yes	We train an LSTM network with hidden size 100 for this task. ...We train a GRU network with hidden size 300. In addition, we introduce another dataset Standard Sentiment Treebank (SST) ...a LSTM network with hidden size 300 is trained. Finally, we used the augmented training data for training with 40 epochs (the same with the original model). For each test case, we set the parameter γ (refer to Section 3.3.2) as 5. we select 5, 15, 25, 35, 45 training samples (i.e., m in Section 3.3.2) for the augmentation, respectively.