reproducibilityindex.ai

Dynamically Route Hierarchical Structure Representation to Attentive Capsule for Text Classification

Authors: Wanshan Zheng, Zibin Zheng, Hai Wan, Chuan Chen

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive results on eleven benchmark datasets demonstrate that the proposed model obtains competitive performance against several state-of-the-art baselines.
Researcher Affiliation	Academia	Wanshan Zheng1,2 , Zibin Zheng1,2 , Hai Wan1 , Chuan Chen1,2 1School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China 2Guangdong Key Laboratory for Big Data Analysis and Simulation of Public Opinion, The School of Communication and Design, Sun Yat-sen University, Guangzhou, China zhengwsh3@mail2.sysu.edu.cn, {zhzibin, wanhai, chenchuan}@mail.sysu.edu.cn
Pseudocode	Yes	Algorithm 1 Dynamic Routing Algorithm
Open Source Code	Yes	Our code is available at https://github.com/zhengwsh/HAC.
Open Datasets	Yes	For small datasets, seven widely-studied datasets [Kim, 2014] include: movie reviews (MR), Stanford Sentiment Treebank (SST-1 and SST-2), subjectivity classiﬁcation (SUBJ), question dataset (TREC), customer review (CR), opinion polarity (MPQA). For large datasets, four widely-studied datasets [Zhang et al., 2015] include: AG s news corpus (AG), DBPedia ontology (DBP), Yelp reviews (Yelp.P and Yelp.F). The detailed statistics are listed in Table 1.
Dataset Splits	Yes	Following the evaluation scheme in existing literatures, for MR, CR, Subj, MPQA, we use nested 10-fold cross-validation, for TREC, AG, DBP, Yelp.P, Yelp.F, 10-fold cross-validation, and for SST-1, SST-2, standard validation.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models used for running experiments.
Software Dependencies	No	The paper mentions GloVe and ELMo embeddings and the Adam optimizer, but does not provide specific version numbers for software dependencies or libraries used for implementation.
Experiment Setup	Yes	We set the hidden state dimension of Bi-GRU to be 100 for each direction. We adopt 5 dilated convolutional blocks, with ﬁlter window size 2 and ﬁlter number 100. We set the target capsule dimension to be 50, and use 3 iterations of routing for all datasets. The MLP classiﬁer has a hidden layer of size 50 using Re LU activation. ... Dropout regularization is employed on the input embedding layer, with the dropout rate 0.5. ... We train our model s parameters using gradient-based optimizer Adam, with an initial learning rate 1e-3. We halve the learning rate if the dev accuracy doesn t increase in 3 training epochs, and set the minimum rate to be 1e-4. We conduct mini-batch with size 8 for small datasets, and size 128 for large datasets. The training process lasts at most 30 epochs on all the datasets.