Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
P-SIF: Document Embeddings Using Partition Averaging
Authors: Vivek Gupta, Ankit Saw, Pegah Nokhiz, Praneeth Netrapalli, Piyush Rai, Partha Talukdar7863-7870
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a comprehensive set of experiments, we demonstrate P-SIF s effectiveness compared to simple weighted averaging and many other baselines. We perform a comprehensive set of experiments on several text similarity and multiclass or multilabel text classification tasks. |
| Researcher Affiliation | Collaboration | 1School of Computing, University of Utah, 2Info Edge (India) Limited, 3Microsoft Research Lab, Bangalore, 4Computer Science Department, IIT Kanpur, 5Indian Institute of Science, Bangalore |
| Pseudocode | Yes | Algorithm 1: P-SIF Embedding |
| Open Source Code | Yes | We have released the source code for P-SIF embeddings. 2 |
| Open Datasets | Yes | We perform our experiments on the Sem Eval dataset (2012 2017). We run multi-class experiments on 20News Group dataset, and multi-label classification experiments on Reuters-21578 dataset. |
| Dataset Splits | Yes | We use 5-fold cross-validation on the F1 score to tune hyperparameters. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions methods like 'Linear SVM' and 'Logistic regression' but does not specify any software names with version numbers for implementation details or dependencies. |
| Experiment Setup | Yes | We use the fixed weighting parameter a value of 10 3, and the word frequencies p(w) are estimated from the commoncrawl dataset. |