Dynamically Route Hierarchical Structure Representation to Attentive Capsule for Text Classification

Authors: Wanshan Zheng, Zibin Zheng, Hai Wan, Chuan Chen

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive results on eleven benchmark datasets demonstrate that the proposed model obtains competitive performance against several state-of-the-art baselines.
Researcher Affiliation Academia Wanshan Zheng1,2 , Zibin Zheng1,2 , Hai Wan1 , Chuan Chen1,2 1School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China 2Guangdong Key Laboratory for Big Data Analysis and Simulation of Public Opinion, The School of Communication and Design, Sun Yat-sen University, Guangzhou, China zhengwsh3@mail2.sysu.edu.cn, {zhzibin, wanhai, chenchuan}@mail.sysu.edu.cn
Pseudocode Yes Algorithm 1 Dynamic Routing Algorithm
Open Source Code Yes Our code is available at https://github.com/zhengwsh/HAC.
Open Datasets Yes For small datasets, seven widely-studied datasets [Kim, 2014] include: movie reviews (MR), Stanford Sentiment Treebank (SST-1 and SST-2), subjectivity classification (SUBJ), question dataset (TREC), customer review (CR), opinion polarity (MPQA). For large datasets, four widely-studied datasets [Zhang et al., 2015] include: AG s news corpus (AG), DBPedia ontology (DBP), Yelp reviews (Yelp.P and Yelp.F). The detailed statistics are listed in Table 1.
Dataset Splits Yes Following the evaluation scheme in existing literatures, for MR, CR, Subj, MPQA, we use nested 10-fold cross-validation, for TREC, AG, DBP, Yelp.P, Yelp.F, 10-fold cross-validation, and for SST-1, SST-2, standard validation.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models used for running experiments.
Software Dependencies No The paper mentions GloVe and ELMo embeddings and the Adam optimizer, but does not provide specific version numbers for software dependencies or libraries used for implementation.
Experiment Setup Yes We set the hidden state dimension of Bi-GRU to be 100 for each direction. We adopt 5 dilated convolutional blocks, with filter window size 2 and filter number 100. We set the target capsule dimension to be 50, and use 3 iterations of routing for all datasets. The MLP classifier has a hidden layer of size 50 using Re LU activation. ... Dropout regularization is employed on the input embedding layer, with the dropout rate 0.5. ... We train our model s parameters using gradient-based optimizer Adam, with an initial learning rate 1e-3. We halve the learning rate if the dev accuracy doesn t increase in 3 training epochs, and set the minimum rate to be 1e-4. We conduct mini-batch with size 8 for small datasets, and size 128 for large datasets. The training process lasts at most 30 epochs on all the datasets.