Dynamically Route Hierarchical Structure Representation to Attentive Capsule for Text Classification
Authors: Wanshan Zheng, Zibin Zheng, Hai Wan, Chuan Chen
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive results on eleven benchmark datasets demonstrate that the proposed model obtains competitive performance against several state-of-the-art baselines. |
| Researcher Affiliation | Academia | Wanshan Zheng1,2 , Zibin Zheng1,2 , Hai Wan1 , Chuan Chen1,2 1School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China 2Guangdong Key Laboratory for Big Data Analysis and Simulation of Public Opinion, The School of Communication and Design, Sun Yat-sen University, Guangzhou, China zhengwsh3@mail2.sysu.edu.cn, {zhzibin, wanhai, chenchuan}@mail.sysu.edu.cn |
| Pseudocode | Yes | Algorithm 1 Dynamic Routing Algorithm |
| Open Source Code | Yes | Our code is available at https://github.com/zhengwsh/HAC. |
| Open Datasets | Yes | For small datasets, seven widely-studied datasets [Kim, 2014] include: movie reviews (MR), Stanford Sentiment Treebank (SST-1 and SST-2), subjectivity classification (SUBJ), question dataset (TREC), customer review (CR), opinion polarity (MPQA). For large datasets, four widely-studied datasets [Zhang et al., 2015] include: AG s news corpus (AG), DBPedia ontology (DBP), Yelp reviews (Yelp.P and Yelp.F). The detailed statistics are listed in Table 1. |
| Dataset Splits | Yes | Following the evaluation scheme in existing literatures, for MR, CR, Subj, MPQA, we use nested 10-fold cross-validation, for TREC, AG, DBP, Yelp.P, Yelp.F, 10-fold cross-validation, and for SST-1, SST-2, standard validation. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for running experiments. |
| Software Dependencies | No | The paper mentions GloVe and ELMo embeddings and the Adam optimizer, but does not provide specific version numbers for software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | We set the hidden state dimension of Bi-GRU to be 100 for each direction. We adopt 5 dilated convolutional blocks, with filter window size 2 and filter number 100. We set the target capsule dimension to be 50, and use 3 iterations of routing for all datasets. The MLP classifier has a hidden layer of size 50 using Re LU activation. ... Dropout regularization is employed on the input embedding layer, with the dropout rate 0.5. ... We train our model s parameters using gradient-based optimizer Adam, with an initial learning rate 1e-3. We halve the learning rate if the dev accuracy doesn t increase in 3 training epochs, and set the minimum rate to be 1e-4. We conduct mini-batch with size 8 for small datasets, and size 128 for large datasets. The training process lasts at most 30 epochs on all the datasets. |