Logic Constrained Pointer Networks for Interpretable Textual Similarity
Authors: Subhadeep Maji, Rohan Kumar, Manish Bansal, Kalyani Roy, Pawan Goyal
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The model achieves an F1 score of 97.73 and 96.32 on the benchmark Sem Eval datasets for the chunk alignment task, showing large improvements over the existing solutions. Experiments over two different Sem Eval datasets indicate that our proposed approach achieves state of the art results on both the datasets. Through ablation studies, we also find that the proposed logical constraints help boost the performance on both the datasets. |
| Researcher Affiliation | Collaboration | 1Flipkart 2Indian Institute of Technology, Kharagpur msubhade@amazon.com, {rohankumar, manish.bansal}@flipkart.com, kroy@iitkgp.ac.in, pawang@cse.iitkgp.ac.in |
| Pseudocode | No | The paper describes the model architecture and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is available at https://github. com/manishb89/interpretable sentence similarity |
| Open Datasets | Yes | We use Sem Eval 2016 Task 2 dataset for interpretable semantic textual similarity [Agirre et al., 2016]. It consists of examples from two domains; News Headlines and Flickr Images. |
| Dataset Splits | No | The paper states a "2:1 split between train and test sets" but does not specify a validation split. It mentions "did hyperparamter tuning on training set F1" and "employed early stopping using training set F1 as a metric", which implies a validation process but without specifying a dedicated split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions several software components like BERT, GloVe, ConceptNet, PPDB, and Spacy, but does not specify version numbers for these or any other ancillary software. |
| Experiment Setup | Yes | We trained each of the model configurations on train part of the Sem Eval dataset and did hyperparamter tuning on training set F1. We fixed the entropy regularization strength λ to 0.6 across all experiments and changing it had little effect on results. The embedding dimension for Glove based representations was 300 and for BERT was 768. To avoid over-fitting, we employed early stopping using training set F1 as a metric and stop the training if training set F1 does not improve over 5 successive epochs. The best hyperparameter configuration corresponding to M4 are; ρ = 2 and PN dimension of 100 for experiments on headlines dataset and ρ = 2 and PN dimension of 150 for experiments on images dataset. |