Distributed Representation of Words in Cause and Effect Spaces

Authors: Zhipeng Xie, Feiteng Mu7330-7337

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results have shown that Max-Matching and Attentive-Matching models significantly outperform several state-of-the-art competitors by a large margin on both English and Chinese corpora.
Researcher Affiliation Academia Zhipeng Xie, Feiteng Mu School of Computer Science Shanghai Key Laboratory of Data Science Fudan University, Shanghai, China
Pseudocode No The paper describes the models (Pairwise-Matching, Max-Matching, Attentive-Matching) using mathematical equations and descriptive text, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code No The paper does not contain any statement about releasing the source code for the proposed models or a link to a code repository.
Open Datasets Yes To make an evaluation on English, we build our models on a corpus of 815,233 cause-effect phrase pairs which was extracted with a set of 13 rules from Gigaword and Simple English Wikipedia. Both the rules and the corpus are taken from (Sharp et al. 2016)1. 1http://clulab.cs.arizona.edu/data/emnlp2016-causal/ ... We apply the above causal patterns on two raw Chinese corpora, the Baike corpus and the Sogou CS corpus, where Baike is a 10GB data crawled from a Chinese encyclopedia website and Sogou CS4 (Wang et al. 2008) is the news data on the web
Dataset Splits No The paper mentions "five-fold cross validation" for the Causal QA Task, but does not provide specific train/validation/test dataset splits (e.g., percentages or counts) for the main causal embedding model training.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions "Py LTP" as a Chinese dependency parser, but does not specify its version. It also refers to "word2vec" and "SVM ranker" without version numbers.
Experiment Setup Yes We use simple gradient descent algorithm to train our models, with learning rate of 0.005. Other related hyperparameters are listed as follows. The number of training epochs are set to 30, and the batch size is 256. The words whose frequencies are less than 8 are pruned. The cause embeddings and the effect embeddings have the same dimensionality of 200. The negative sampling rate is 10, which means that we samples 10 negative phrase pairs for each positive phrase pair. ... where α and γ are two hyperparameters, which are set to 0.8 and 2.0 by default.