Bag-of-Embeddings for Text Classification

Authors: Peng Jin, Yue Zhang, Xingyuan Chen, Yunqing Xia

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two standard document classification benchmark data show that our model achieve higher accuracies and macro-F1 scores compared than state-ofthe-art models.
Researcher Affiliation Collaboration 1 School of Computer Science, Leshan Normal University, Leshan China, 614000 2 Singapore University of Technology and Design, Singapore 487372 3 Search Technology Center, Microsoft, Beijing China, 100087
Pseudocode No The paper describes methods using mathematical equations and prose, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code of this paper is released at https://github.com/hiccxy/Bag-of-embedding-fortext-classification.
Open Datasets Yes We choose the twenty newsgroup (20NG)2 test [Lang, 1995] for multi-class classification. We use the bydate data, which consists of 11,314 training instances and 7,532 test instances... For imbalanced classification, Lewis [1995] introduced the Reuters-21578 corpus3. R8 consists of 5,485 documents for training and 2,189 for testing.
Dataset Splits No The paper specifies training and test instances for 20NG (11,314 training, 7,532 test) and R8 (5,485 training, 2,189 testing), but does not explicitly mention a separate validation split for their proposed model, nor does it detail cross-validation for their model (only for a baseline SVM).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'LIBSVM' for baselines, but does not provide specific version numbers for it or any other software dependencies crucial for reproducing their own model's experiments.
Experiment Setup Yes For the parameters, we set l = 5, the iteration number to 5, the size of context window to 10 and the dimensions of word vector to 100.