Decision-Guided Weighted Automata Extraction from Recurrent Neural Networks
Authors: Xiyue Zhang, Xiaoning Du, Xiaofei Xie, Lei Ma, Yang Liu, Meng Sun11699-11707
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section is devoted to evaluating the effectiveness, scalability and usefulness of our approach. Four Research Questions (RQs) are to be answered: What is the approximation accuracy of the WFAs extracted through our approach? How effective is the context-aware state abstraction on improving the approximation accuracy? How effective is the synonym transition method, especially when applied to large-scale tasks? What is the performance of the WFA extracted from black-box RNNs? |
| Researcher Affiliation | Collaboration | 1 Peking University, China 2 Monash University, Australia 3 Nanyang Technological University, Singapore 4 Kyushu University, Japan 5 Hangzhou Xinzhou Network Technology Co., Ltd., China |
| Pseudocode | Yes | Algorithm 1: Extraction of WFA from an RNN |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository for their methodology. |
| Open Datasets | Yes | For comparisons, we perform comprehensive evaluation with a total of 13 benchmarks, including 10 datasets from the SPi Ce competition (Balle et al. 2017) and 3 artiļ¬cial unbounded history languages (UHL) (2019). Besides, another two real-world datasets from NLP domain are further selected for evaluation of the scalability and usefulness, including the Cog Comp QC Dataset (abbrev. QC) (Li and Roth 2002) and the Jigsaw Toxic Comment Dataset (abbrev. Toxic) (Jigsaw 2018). |
| Dataset Splits | Yes | All SPi Ce and UHL datasets are split into training/validation/test sets with the percentage of 90%/5%/5% to train and test the RNN models. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions). |
| Experiment Setup | Yes | For each dataset, a 2-layer LSTM network with 50 hidden dimensions is trained, with an exception for the SPi Ce 4/6/9 datasets to be with 100 hidden dimensions and SPi Ce 10/14 datasets with 20/30 hidden dimensions, respectively. For the QC dataset, we use 20K samples for training and 8K samples for testing, and train a single-layer LSTM with 32 hidden units... For the Toxic dataset, ...train a single-layer LSTM model with 128 hidden units... The parameter k is set to 5 and 2, respectively, for calculating NDCG scores... The equipartion level t is set to 1 for the SPi Ce datasets, and to 15/10/10 for the UHL datasets. |