Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Constraint Reasoning Embedded Structured Prediction
Authors: Nan Jiang, Maosen Zhang, Willem-Jan van Hoeve, Yexiang Xue
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Core-Sp on three applications: vehicle dispatching service planning, if-then program synthesis, and text2SQL generation. The proposed Core-Sp module demonstrates superior performance over state-of-the-art approaches in all three applications. The structures generated with Core-Sp satisfy 100% of the constraints when using exact decision diagrams. In addition, Core-Sp boosts learning performance by reducing the modeling space via constraint satisfaction. |
| Researcher Affiliation | Collaboration | Nan Jiang EMAIL Department of Computer Science Purdue University West Lafayette, Indiana, USA. Maosen Zhang EMAIL Byte Dance Beijing, China. Willem-Jan van Hoeve EMAIL Tepper School of Business Carnegie Mellon University Pittsburgh, Pennsylvania, USA. Yexiang Xue EMAIL Department of Computer Science Purdue University West Lafayette, Indiana, USA. |
| Pseudocode | Yes | Algorithm 1: Iterative algorithm for searching optimal performance of Core-Sp. |
| Open Source Code | Yes | The code for all the experiments is available at Git Hub.3 3. Code summary: https://jiangnanhugo.github.io/CORE-SP/ |
| Open Datasets | Yes | Our experiments are on a data set consisting of 29 cities in Bavaria.4 ... 4. Instance bays29.tsp from TSPLIB: http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/tsp/ The data sets for this experiment are crawled from the IFTTT and Zapier websites.5,6 ... 5. IFTTT data set is collected from https://ifttt.com/ ... 6. Zapier data set is collected from https://zapier.com/ We conduct experiments on the large-scale Wiki SQL data set (Zhong et al., 2017), which contains 80, 654 examples of questions and SQL queries distributed across 24, 241 tables from Wikipedia. |
| Dataset Splits | Yes | Dataset #train set #val set #test set #quadruple #vocabulary IFTTT 66761 4148 2640 (111, 443, 88, 161) 4000 Zapier 24454 4809 2576 (1353, 1755, 1333, 1466) 3782 |
| Hardware Specification | No | No specific hardware details (GPU models, CPU types, memory amounts, or cloud platform specifications) are mentioned in the paper. |
| Software Dependencies | No | The implementation is based on SQLNova. We use the BERT-base model (Devlin et al., 2019) as the word embedding. The entire model takes up to 3 days to train for 50 epochs. No specific software versions (e.g., Python 3.x, PyTorch 1.x, CUDA x.x) are provided for the implementations described. |
| Experiment Setup | Yes | The generator G uses an encoder to learn a representation vector for the input and uses a sequential decoder to generate the schedule: hj = LSTM(x, hj 1), ... The discriminator D is trained ... It uses the following LSTM structure: sj = LSTM(qj, sj 1), ... The loss function L is: min G max D Ex,y [log D (y, x)] + Ez,x,y [log (1 D (G (x, z) , y))] . The Latent Attention model is a bidirectional LSTM with residual connection, followed by the self-attention mechanism. ... During training, we use cross-entropy loss as the loss function L that minimizes the difference between the ground-truth prediction and the probabilities pts, ptf, pas, paf produced from Core-Sp ... SQLova has a sequence-to-sequence architecture. It first encodes a natural language sentence and the table headers into a high-dimensional vector. Then the decoder of SQLova decodes the hidden representation into the predictions of various entities in the SQL query. ... The entire model takes up to 3 days to train for 50 epochs. We choose the model that achieves the best execution accuracy on the validation data set for both the baseline and Core-Sp and calculate the corresponding statistics reflected in Table 3. |