A New Method of Region Embedding for Text Classification

Authors: chao qiao, bo huang, guocheng niu, daren li, daxiang dong, wei he, dianhai yu, hua wu

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed method outperforms existing methods in text classification on several benchmark datasets. The results also indicate that our method can indeed capture the salient phrasal expressions in the texts.
Researcher Affiliation Industry Baidu Inc., Beijing, China National Engineering Laboratory of Deep Learning Technology and Application, China {qiaochao, huangbo02, niuguocheng, lidaren, daxiangdong, hewei06, yudianhai, wu hua}@baidu.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The code 1 is publicly available on the Internet. 1https://github.com/text-representation/local-context-unit
Open Datasets Yes We use publicly available datasets from Zhang et al. (2015) to evaluate our models.
Dataset Splits Yes For our models, optimal hyperparameters are tuned with 10% of the training set on Yelp Review Full dataset, and identical hyperparameters are applied to all datasets.
Hardware Specification Yes Algorithms are entirely implemented with TensorFlow and trained on NVIDIA Tesla P40 GPUs.
Software Dependencies No The paper states 'Algorithms are entirely implemented with TensorFlow' but does not specify a version number for TensorFlow or any other software dependencies with their versions.
Experiment Setup Yes For our models, optimal hyperparameters are tuned with 10% of the training set on Yelp Review Full dataset, and identical hyperparameters are applied to all datasets: the dimension of word embedding is 128, the region size is 7 which means the shape of local context unit matrix of each word is 128 7, the initial learning rate is set to 1 10 4, and the batch size is 16. For optimization, the embeddings of words and the units are randomly initialized with Gaussian Distribution. Adam (Kingma & Ba, 2014) is used as the optimizer. We do not use any extra regularization methods, like L2 normalization or dropout.