Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

P-SIF: Document Embeddings Using Partition Averaging

Authors: Vivek Gupta, Ankit Saw, Pegah Nokhiz, Praneeth Netrapalli, Piyush Rai, Partha Talukdar7863-7870

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a comprehensive set of experiments, we demonstrate P-SIF s effectiveness compared to simple weighted averaging and many other baselines. We perform a comprehensive set of experiments on several text similarity and multiclass or multilabel text classification tasks.
Researcher Affiliation Collaboration 1School of Computing, University of Utah, 2Info Edge (India) Limited, 3Microsoft Research Lab, Bangalore, 4Computer Science Department, IIT Kanpur, 5Indian Institute of Science, Bangalore
Pseudocode Yes Algorithm 1: P-SIF Embedding
Open Source Code Yes We have released the source code for P-SIF embeddings. 2
Open Datasets Yes We perform our experiments on the Sem Eval dataset (2012 2017). We run multi-class experiments on 20News Group dataset, and multi-label classification experiments on Reuters-21578 dataset.
Dataset Splits Yes We use 5-fold cross-validation on the F1 score to tune hyperparameters.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper mentions methods like 'Linear SVM' and 'Logistic regression' but does not specify any software names with version numbers for implementation details or dependencies.
Experiment Setup Yes We use the fixed weighting parameter a value of 10 3, and the word frequencies p(w) are estimated from the commoncrawl dataset.