reproducibilityindex.ai

Merging Weak and Active Supervision for Semantic Parsing

Authors: Ansong Ni, Pengcheng Yin, Graham Neubig8536-8543

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of our method on two different datasets. Experiments on the Wiki SQL show that by annotating only 1.8% of examples, we improve over a state-of-the-art weakly-supervised baseline by 6.4%, achieving an accuracy of 79.0%, which is only 1.3% away from the model trained with full supervision. Experiments on Wiki Table Questions with human annotators show that our method can improve the performance with only 100 active queries, especially for weakly-supervised parsers learnt from a cold start.
Researcher Affiliation	Academia	Ansong Ni, Pengcheng Yin, Graham Neubig Carnegie Mellon University {ansongn, pcyin, gneubig}@cs.cmu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at https://github.com/niansong1996/wassp
Open Datasets	Yes	Dataset: We evaluate the performance of WASSP on two different datasets: Wiki SQL (Zhong, Xiong, and Socher 2017) and Wiki Table Questions (Pasupat and Liang 2015).
Dataset Splits	Yes	a single model (i.e. without ensemble) can reach an execution accuracy of 72.4% and 72.6% on the dev and test set, respectively.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory, or cloud instance types) for running its experiments.
Software Dependencies	No	The paper mentions using 'neural symbolic machines (NSM)' and 'memory-augmented policy optimization (MAPO)' but does not provide specific version numbers for these or any other software libraries or frameworks.
Experiment Setup	Yes	Training Procedure: First we follow the procedure of (Liang et al. 2018) to train NSM with MAPO on both Wiki SQL and Wiki Table Questions datasets with the same set of hyperparameters as used in the original paper. Then for Wiki SQL, we run WASSP for 3 iterations with query budget 1,000 or more and only run for one iteration with smaller budget. For each iteration, the model queries for extra supervision and then it is trained for another 5K steps. The query budget is evenly distributed to these 3 iterations and limited by the total amount. For Wiki Table Questions, we simply run one such iteration (due to limit number of annotations obtained) but let it train for 50K steps with human annotated MRs.