Logic Constrained Pointer Networks for Interpretable Textual Similarity

Authors: Subhadeep Maji, Rohan Kumar, Manish Bansal, Kalyani Roy, Pawan Goyal

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The model achieves an F1 score of 97.73 and 96.32 on the benchmark Sem Eval datasets for the chunk alignment task, showing large improvements over the existing solutions. Experiments over two different Sem Eval datasets indicate that our proposed approach achieves state of the art results on both the datasets. Through ablation studies, we also find that the proposed logical constraints help boost the performance on both the datasets.
Researcher Affiliation Collaboration 1Flipkart 2Indian Institute of Technology, Kharagpur msubhade@amazon.com, {rohankumar, manish.bansal}@flipkart.com, kroy@iitkgp.ac.in, pawang@cse.iitkgp.ac.in
Pseudocode No The paper describes the model architecture and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at https://github. com/manishb89/interpretable sentence similarity
Open Datasets Yes We use Sem Eval 2016 Task 2 dataset for interpretable semantic textual similarity [Agirre et al., 2016]. It consists of examples from two domains; News Headlines and Flickr Images.
Dataset Splits No The paper states a "2:1 split between train and test sets" but does not specify a validation split. It mentions "did hyperparamter tuning on training set F1" and "employed early stopping using training set F1 as a metric", which implies a validation process but without specifying a dedicated split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions several software components like BERT, GloVe, ConceptNet, PPDB, and Spacy, but does not specify version numbers for these or any other ancillary software.
Experiment Setup Yes We trained each of the model configurations on train part of the Sem Eval dataset and did hyperparamter tuning on training set F1. We fixed the entropy regularization strength λ to 0.6 across all experiments and changing it had little effect on results. The embedding dimension for Glove based representations was 300 and for BERT was 768. To avoid over-fitting, we employed early stopping using training set F1 as a metric and stop the training if training set F1 does not improve over 5 successive epochs. The best hyperparameter configuration corresponding to M4 are; ρ = 2 and PN dimension of 100 for experiments on headlines dataset and ρ = 2 and PN dimension of 150 for experiments on images dataset.