Bag-of-Embeddings for Text Classification
Authors: Peng Jin, Yue Zhang, Xingyuan Chen, Yunqing Xia
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two standard document classification benchmark data show that our model achieve higher accuracies and macro-F1 scores compared than state-ofthe-art models. |
| Researcher Affiliation | Collaboration | 1 School of Computer Science, Leshan Normal University, Leshan China, 614000 2 Singapore University of Technology and Design, Singapore 487372 3 Search Technology Center, Microsoft, Beijing China, 100087 |
| Pseudocode | No | The paper describes methods using mathematical equations and prose, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of this paper is released at https://github.com/hiccxy/Bag-of-embedding-fortext-classification. |
| Open Datasets | Yes | We choose the twenty newsgroup (20NG)2 test [Lang, 1995] for multi-class classification. We use the bydate data, which consists of 11,314 training instances and 7,532 test instances... For imbalanced classification, Lewis [1995] introduced the Reuters-21578 corpus3. R8 consists of 5,485 documents for training and 2,189 for testing. |
| Dataset Splits | No | The paper specifies training and test instances for 20NG (11,314 training, 7,532 test) and R8 (5,485 training, 2,189 testing), but does not explicitly mention a separate validation split for their proposed model, nor does it detail cross-validation for their model (only for a baseline SVM). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'LIBSVM' for baselines, but does not provide specific version numbers for it or any other software dependencies crucial for reproducing their own model's experiments. |
| Experiment Setup | Yes | For the parameters, we set l = 5, the iteration number to 5, the size of context window to 10 and the dimensions of word vector to 100. |