reproducibilityindex.ai

Robust Text Classification in the Presence of Confounding Bias

Authors: Virgile Landeiro, Aron Culotta

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On three diverse text classiﬁcations tasks, we ﬁnd that covariate adjustment results in higher accuracy than competing baselines over a range of confounding relationships (e.g., in one setting, accuracy improves from 60% to 81%).
Researcher Affiliation	Academia	Virgile Landeiro and Aron Culotta Department of Computer Science Illinois Institute of Technology Chicago, IL 60616 vlandeir@hawk.iit.edu, aculotta@iit.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any specific links or statements indicating that the source code for the methodology is openly available.
Open Datasets	Yes	We evaluate our approach on three diverse classiﬁcation tasks: predicting the location of a Twitter user (confounded by gender), the political afﬁliation of a parliament member (confounded by majority party status), and the sentiment of a movie review (confounded by genre). ... IMDb data from Maas et al. (2011). ... data on the 36th and 39th Canadian Parliaments as studied previously (Hirst, Riabinin, and Graham 2010; Dahll of 2012).
Dataset Splits	Yes	For each btrain, btest pair, we sample 5 train/test splits and report the average accuracy. For Parliament, we use 5-fold cross-validation on the 39th Parliament; each fold reserves a different 20% of the 39th Parliament for testing.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions using 'L2-regularized logistic regression' but does not specify any software names with version numbers for replication.
Experiment Setup	Yes	From an implementation perspective, the approach above is rather straightforward: p(z) is computed using the maximum likelihood estimate above. We compute p(y\|x, z) efﬁciently by simply appending two additional features ci,0 and ci,1 to each instance xi representing z = 0 and z = 1. The ﬁrst (resp. second) feature is set to v1 if zi = 0 (resp. zi = 1) and the second feature (resp. ﬁrst) is set to 0. In the default case, we let v1 = 1 but we revisit this decision in the next section.