reproducibilityindex.ai

Decision-Guided Weighted Automata Extraction from Recurrent Neural Networks

Authors: Xiyue Zhang, Xiaoning Du, Xiaofei Xie, Lei Ma, Yang Liu, Meng Sun11699-11707

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section is devoted to evaluating the effectiveness, scalability and usefulness of our approach. Four Research Questions (RQs) are to be answered: What is the approximation accuracy of the WFAs extracted through our approach? How effective is the context-aware state abstraction on improving the approximation accuracy? How effective is the synonym transition method, especially when applied to large-scale tasks? What is the performance of the WFA extracted from black-box RNNs?
Researcher Affiliation	Collaboration	1 Peking University, China 2 Monash University, Australia 3 Nanyang Technological University, Singapore 4 Kyushu University, Japan 5 Hangzhou Xinzhou Network Technology Co., Ltd., China
Pseudocode	Yes	Algorithm 1: Extraction of WFA from an RNN
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository for their methodology.
Open Datasets	Yes	For comparisons, we perform comprehensive evaluation with a total of 13 benchmarks, including 10 datasets from the SPi Ce competition (Balle et al. 2017) and 3 artiﬁcial unbounded history languages (UHL) (2019). Besides, another two real-world datasets from NLP domain are further selected for evaluation of the scalability and usefulness, including the Cog Comp QC Dataset (abbrev. QC) (Li and Roth 2002) and the Jigsaw Toxic Comment Dataset (abbrev. Toxic) (Jigsaw 2018).
Dataset Splits	Yes	All SPi Ce and UHL datasets are split into training/validation/test sets with the percentage of 90%/5%/5% to train and test the RNN models.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup	Yes	For each dataset, a 2-layer LSTM network with 50 hidden dimensions is trained, with an exception for the SPi Ce 4/6/9 datasets to be with 100 hidden dimensions and SPi Ce 10/14 datasets with 20/30 hidden dimensions, respectively. For the QC dataset, we use 20K samples for training and 8K samples for testing, and train a single-layer LSTM with 32 hidden units... For the Toxic dataset, ...train a single-layer LSTM model with 128 hidden units... The parameter k is set to 5 and 2, respectively, for calculating NDCG scores... The equipartion level t is set to 1 for the SPi Ce datasets, and to 15/10/10 for the UHL datasets.