Decision-Guided Weighted Automata Extraction from Recurrent Neural Networks

Authors: Xiyue Zhang, Xiaoning Du, Xiaofei Xie, Lei Ma, Yang Liu, Meng Sun11699-11707

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section is devoted to evaluating the effectiveness, scalability and usefulness of our approach. Four Research Questions (RQs) are to be answered: What is the approximation accuracy of the WFAs extracted through our approach? How effective is the context-aware state abstraction on improving the approximation accuracy? How effective is the synonym transition method, especially when applied to large-scale tasks? What is the performance of the WFA extracted from black-box RNNs?
Researcher Affiliation Collaboration 1 Peking University, China 2 Monash University, Australia 3 Nanyang Technological University, Singapore 4 Kyushu University, Japan 5 Hangzhou Xinzhou Network Technology Co., Ltd., China
Pseudocode Yes Algorithm 1: Extraction of WFA from an RNN
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository for their methodology.
Open Datasets Yes For comparisons, we perform comprehensive evaluation with a total of 13 benchmarks, including 10 datasets from the SPi Ce competition (Balle et al. 2017) and 3 artificial unbounded history languages (UHL) (2019). Besides, another two real-world datasets from NLP domain are further selected for evaluation of the scalability and usefulness, including the Cog Comp QC Dataset (abbrev. QC) (Li and Roth 2002) and the Jigsaw Toxic Comment Dataset (abbrev. Toxic) (Jigsaw 2018).
Dataset Splits Yes All SPi Ce and UHL datasets are split into training/validation/test sets with the percentage of 90%/5%/5% to train and test the RNN models.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes For each dataset, a 2-layer LSTM network with 50 hidden dimensions is trained, with an exception for the SPi Ce 4/6/9 datasets to be with 100 hidden dimensions and SPi Ce 10/14 datasets with 20/30 hidden dimensions, respectively. For the QC dataset, we use 20K samples for training and 8K samples for testing, and train a single-layer LSTM with 32 hidden units... For the Toxic dataset, ...train a single-layer LSTM model with 128 hidden units... The parameter k is set to 5 and 2, respectively, for calculating NDCG scores... The equipartion level t is set to 1 for the SPi Ce datasets, and to 15/10/10 for the UHL datasets.