Neural Simile Recognition with Cyclic Multitask Learning and Local Attention

Authors: Jiali Zeng, Linfeng Song, Jinsong Su, Jun Xie, Wei Song, Jiebo Luo9515-9522

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our framework significantly outperforms the current state-of-the-art model and our carefully designed baselines, and the gains are still remarkable using BERT.
Researcher Affiliation Collaboration Jiali Zeng,1 Linfeng Song,2 Jinsong Su,1 Jun Xie,3 Wei Song,4 Jiebo Luo5 1Xiamen University, Xiamen, China, 2Tencent AI Lab, Bellevue, USA 3Mobile Internet Group, Tencent Technology Co., Ltd, Beijing, China 4Capital Normal University, Beijing, China, 5University of Rochester, Rochester NY, USA
Pseudocode No The paper describes the architecture and processes using natural language and mathematical formulas, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Source Code of this paper are available on https://github.com/DeepLearnXMU/Cyclic.
Open Datasets Yes We evaluate our model on a standard Chinese simile recognition benchmark (Liu et al. 2018), where each instance contains one or zero similes.
Dataset Splits Yes We follow Liu et al. (2018) to conduct 5-fold cross validation: the dataset is first equally divided into 5 folds. For each time, 4 folds are used as training and validation sets (80% for training, 20% for validation), and the remaining fold is used for testing.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software components like Word2Vec, BERT, and Adadelta, but does not provide specific version numbers for the underlying software libraries or programming languages used (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes The hidden sizes for Bi-LSTM encoder and decoder are 128. The batch size is 80. The dropout rate is 0.5. We adopt Adadelta (Zeiler 2012) as the optimizer with a learning rate of 1.0 and early stopping (Prechelt 1998). The optimal hyper-parameters α=0.1, β=0.8 are chosen using the validation set.