A Topic-Aware Reinforced Model for Weakly Supervised Stance Detection
Authors: Penghui Wei, Wenji Mao, Guandan Chen7249-7256
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches. |
| Researcher Affiliation | Academia | Penghui Wei, Wenji Mao, Guandan Chen SKL-MCCS, Institute of Automation, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China {weipenghui2016, wenji.mao, chenguandan2014}@ia.ac.cn |
| Pseudocode | Yes | Algorithm 1 Joint Training Procedure of TARM |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluate our TARM on Sem Eval-2016 task 6.B dataset (Mohammad et al. 2016), the benchmark of weakly supervised stance detection task. |
| Dataset Splits | Yes | For TDNet, hyper-parameters are tuned by 5-fold cross-validation. |
| Hardware Specification | No | Both of them are trained on a single GPU. |
| Software Dependencies | No | The optimizer is Adam with 64 mini-batch size and 5e-4 learning rate. We add an ℓ2 penalty term with 1e-5 coefficient and use dropout with 0.5 ratio after the input layer and the representation layer to relieve overfitting. |
| Experiment Setup | Yes | For TDNet, hyper-parameters are tuned by 5-fold cross-validation. We first pre-train 200 dimensional word embeddings using Skip Gram (Mikolov et al. 2013) on the domain corpus. GRU hidden states are also 200 dimensional, and N is set to 2. The optimizer is Adam with 64 mini-batch size and 5e-4 learning rate. We add an ℓ2 penalty term with 1e-5 coefficient and use dropout with 0.5 ratio after the input layer and the representation layer to relieve overfitting. For SRNet, we set the max number of tweets in one subset to T = 128. The update times of PPO in one episode is K = 10 (see line 9 in Algorithm 1). The learning rate is 2e-5, and the discount factor is γ = 0.9. |