reproducibilityindex.ai

Logic Constrained Pointer Networks for Interpretable Textual Similarity

Authors: Subhadeep Maji, Rohan Kumar, Manish Bansal, Kalyani Roy, Pawan Goyal

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The model achieves an F1 score of 97.73 and 96.32 on the benchmark Sem Eval datasets for the chunk alignment task, showing large improvements over the existing solutions. Experiments over two different Sem Eval datasets indicate that our proposed approach achieves state of the art results on both the datasets. Through ablation studies, we also ﬁnd that the proposed logical constraints help boost the performance on both the datasets.
Researcher Affiliation	Collaboration	1Flipkart 2Indian Institute of Technology, Kharagpur msubhade@amazon.com, {rohankumar, manish.bansal}@ﬂipkart.com, kroy@iitkgp.ac.in, pawang@cse.iitkgp.ac.in
Pseudocode	No	The paper describes the model architecture and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Source code is available at https://github. com/manishb89/interpretable sentence similarity
Open Datasets	Yes	We use Sem Eval 2016 Task 2 dataset for interpretable semantic textual similarity [Agirre et al., 2016]. It consists of examples from two domains; News Headlines and Flickr Images.
Dataset Splits	No	The paper states a "2:1 split between train and test sets" but does not specify a validation split. It mentions "did hyperparamter tuning on training set F1" and "employed early stopping using training set F1 as a metric", which implies a validation process but without specifying a dedicated split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions several software components like BERT, GloVe, ConceptNet, PPDB, and Spacy, but does not specify version numbers for these or any other ancillary software.
Experiment Setup	Yes	We trained each of the model conﬁgurations on train part of the Sem Eval dataset and did hyperparamter tuning on training set F1. We ﬁxed the entropy regularization strength λ to 0.6 across all experiments and changing it had little effect on results. The embedding dimension for Glove based representations was 300 and for BERT was 768. To avoid over-ﬁtting, we employed early stopping using training set F1 as a metric and stop the training if training set F1 does not improve over 5 successive epochs. The best hyperparameter conﬁguration corresponding to M4 are; ρ = 2 and PN dimension of 100 for experiments on headlines dataset and ρ = 2 and PN dimension of 150 for experiments on images dataset.