ACV-tree: A New Method for Sentence Similarity Modeling

Authors: Yuquan Le, Zhi-Jie Wang, Zhe Quan, Jiawei He, Bin Yao

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results, based on 19 widely-used datasets, demonstrate that our model is effective and competitive, compared against state-of-the-art models.
Researcher Affiliation Academia College of Computer Science and Electronic Engineering, Hunan University, Changsha, China Guangdong Key Lab. of Big Data Anal. and Proc., Sun Yat-Sen University, Guangzhou, China # Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Pseudocode Yes Algorithm 1 COMPSIM(T1,T2)
Open Source Code Yes Our codes are available at the open-source code repository (https://github. com/yuquanle/Sentence-similarity-modeling.git).
Open Datasets Yes Following prior works, we conduct experiments on 19 textual similarity datasets (http://ixa.si.ehu.eus/) that contain all the datasets from Semantic Textual Similarity (STS) tasks (2012-2015)...
Dataset Splits No Each dataset contains many pairs of sentences (e.g. MSRvid dataset contains 750 pairs of sentences).
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No In our experiments, we implement our ACV-tree by using the Stanford Parser [Manning et al., 2017] to generate the constituency tree of the sentence...
Experiment Setup Yes In our experiments, the hyperparameters µ=[0.1,0.2,...,0.9,1.0], and λ =[0.1,0.2,...,0.9,1.0], where the numbers in bold denote the default settings, unless otherwise stated. Following prior works [Arora et al., 2017; Wang et al., 2017], we use the term frequency-inverse document frequency (TF-IDF) scheme to generate the attention weights. The lexical vectors we used are provided by PARAGRAM-SL999 vectors, which is learned by PPDB and is the 300 dimensional Paragram embeddings tuned on Sim Lex999 dataset [Wieting et al., 2015].