reproducibilityindex.ai

Bag-of-Embeddings for Text Classification

Authors: Peng Jin, Yue Zhang, Xingyuan Chen, Yunqing Xia

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two standard document classiﬁcation benchmark data show that our model achieve higher accuracies and macro-F1 scores compared than state-ofthe-art models.
Researcher Affiliation	Collaboration	1 School of Computer Science, Leshan Normal University, Leshan China, 614000 2 Singapore University of Technology and Design, Singapore 487372 3 Search Technology Center, Microsoft, Beijing China, 100087
Pseudocode	No	The paper describes methods using mathematical equations and prose, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code of this paper is released at https://github.com/hiccxy/Bag-of-embedding-fortext-classiﬁcation.
Open Datasets	Yes	We choose the twenty newsgroup (20NG)2 test [Lang, 1995] for multi-class classiﬁcation. We use the bydate data, which consists of 11,314 training instances and 7,532 test instances... For imbalanced classiﬁcation, Lewis [1995] introduced the Reuters-21578 corpus3. R8 consists of 5,485 documents for training and 2,189 for testing.
Dataset Splits	No	The paper specifies training and test instances for 20NG (11,314 training, 7,532 test) and R8 (5,485 training, 2,189 testing), but does not explicitly mention a separate validation split for their proposed model, nor does it detail cross-validation for their model (only for a baseline SVM).
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'LIBSVM' for baselines, but does not provide specific version numbers for it or any other software dependencies crucial for reproducing their own model's experiments.
Experiment Setup	Yes	For the parameters, we set l = 5, the iteration number to 5, the size of context window to 10 and the dimensions of word vector to 100.