Coupling Distributed and Symbolic Execution for Natural Language Queries
Authors: Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our approach significantly outperforms both distributed and symbolic executors, exhibiting high accuracy, high learning efficiency, high execution efficiency, and high interpretability. |
| Researcher Affiliation | Collaboration | 1Key Laboratory of High Confidence Software Technologies (Peking University), Mo E; Software Institute, Peking University, China 2Deeply Curious.ai 3Noah s Ark Lab, Huawei Technologies. |
| Pseudocode | No | The paper describes the primitive operators and the symbolic executor's process in narrative text and tables but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'The data are available at out project website4; the code for data generation can also be downloaded to facilitate further development of the dataset.' This refers to data generation code, not the source code for the proposed methodology. |
| Open Datasets | Yes | The data are available at out project website4; the code for data generation can also be downloaded to facilitate further development of the dataset. https://sites.google.com/site/coupleneuralsymbolic/ |
| Dataset Splits | Yes | The dataset comprises 25k different tables and queries for training; validation and test sets contain 10k samples, respectively, and do not overlap with the training data. |
| Hardware Specification | Yes | All neural networks are implemented in Theano with a TITAN Black GPU and Xeon e7-4820v2 (8-core) CPU; symbolic execution is assessed in C++ implementation. |
| Software Dependencies | No | The paper mentions 'Theano' and 'C++ implementation' as software used, but it does not specify any version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The dimensions of all layers were in the range of 20 50; the learning algorithm was Ada Delta with default hyperparameters. For the pretraining of the symbolic executor, we applied maximum likelihood estimation for 40 epochs to column selection with labels predicted by the distributed executor. We then used the REINFORCE algorithm to improve the policy, where we generated 10 action samples for each data point with the exploration probability ϵ being 0.1. When feeding back to the distributed model, we chose λ from {0.1, 0.5, 1} by validation to balance denotation error and field attention error. |