reproducibilityindex.ai

Densely Connected CNN with Multi-scale Feature Attention for Text Classification

Authors: Shiyao Wang, Minlie Huang, Zhidong Deng

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our model obtains competitive performance against state-of-the-art baselines on six benchmark datasets.
Researcher Affiliation	Academia	Shiyao Wang1, Minlie Huang1, Zhidong Deng1 1State Key Laboratory of Intelligent Technology and Systems Beijing National Research Center for Information Science and Technology Department of Computer Science, Tsinghua University, Beijing 100084, China sy-wang14@mails.tsinghua.edu.cn, aihuang@tsinghua.edu.cn, michael@tsinghua.edu.cn
Pseudocode	No	The paper describes the model architecture and operations through text and diagrams, but does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at: https://github.com/wangshy31/Densely Connected-CNN-with-Multiscale-Feature Attention.git.
Open Datasets	Yes	We evaluated our model on six public datasets: MR from [Pang and Lee, 2005] and the others (AG, DBPedia, Yelp P./F., and Amazon F.) from [Zhang et al., 2015].
Dataset Splits	Yes	The asterisk (*) means there was no standard training/test split and thus 10-fold cross validation was conducted. (From Table 2 footnote)
Hardware Specification	No	The paper details training settings and software used (Caffe), but does not specify the hardware (e.g., CPU/GPU models) on which experiments were conducted.
Software Dependencies	No	The model was implemented with Caffe [Jia et al., 2014]. This only mentions the framework without a specific version number.
Experiment Setup	Yes	The input text was padded to a ﬁxed length maxlen, where maxlen was chosen 50 for MR, 100 for AG and DBPedia, and 300 for Yelp P., Yelp F. and Amazon F., respectively. ... We adopted 5 convolutional blocks for MR and AG, and 6 convolutional blocks for the rest datasets. We chose window size w = 3 and feature dimension k = 128. ... We used stochastic gradient descent (SGD) with a mini-batch of 256. The learning rate is initially set to 0.01 and then gradually decreased to e 5. The training process lasts at most 30 epoches on all the datasets. We applied L1 regularization and the momentum was set to 0.9.