Densely Connected CNN with Multi-scale Feature Attention for Text Classification
Authors: Shiyao Wang, Minlie Huang, Zhidong Deng
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our model obtains competitive performance against state-of-the-art baselines on six benchmark datasets. |
| Researcher Affiliation | Academia | Shiyao Wang1, Minlie Huang1, Zhidong Deng1 1State Key Laboratory of Intelligent Technology and Systems Beijing National Research Center for Information Science and Technology Department of Computer Science, Tsinghua University, Beijing 100084, China sy-wang14@mails.tsinghua.edu.cn, aihuang@tsinghua.edu.cn, michael@tsinghua.edu.cn |
| Pseudocode | No | The paper describes the model architecture and operations through text and diagrams, but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at: https://github.com/wangshy31/Densely Connected-CNN-with-Multiscale-Feature Attention.git. |
| Open Datasets | Yes | We evaluated our model on six public datasets: MR from [Pang and Lee, 2005] and the others (AG, DBPedia, Yelp P./F., and Amazon F.) from [Zhang et al., 2015]. |
| Dataset Splits | Yes | The asterisk (*) means there was no standard training/test split and thus 10-fold cross validation was conducted. (From Table 2 footnote) |
| Hardware Specification | No | The paper details training settings and software used (Caffe), but does not specify the hardware (e.g., CPU/GPU models) on which experiments were conducted. |
| Software Dependencies | No | The model was implemented with Caffe [Jia et al., 2014]. This only mentions the framework without a specific version number. |
| Experiment Setup | Yes | The input text was padded to a fixed length maxlen, where maxlen was chosen 50 for MR, 100 for AG and DBPedia, and 300 for Yelp P., Yelp F. and Amazon F., respectively. ... We adopted 5 convolutional blocks for MR and AG, and 6 convolutional blocks for the rest datasets. We chose window size w = 3 and feature dimension k = 128. ... We used stochastic gradient descent (SGD) with a mini-batch of 256. The learning rate is initially set to 0.01 and then gradually decreased to e 5. The training process lasts at most 30 epoches on all the datasets. We applied L1 regularization and the momentum was set to 0.9. |