reproducibilityindex.ai

Domain Adaptation for Learning from Label Proportions Using Self-Training

Authors: Ehsan Mohammady Ardehaly, Aron Culotta

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on ﬁve diverse tasks indicate an 11% average absolute improvement in accuracy as compared to using LLP without domain adaptation.
Researcher Affiliation	Academia	Ehsan Mohammady Ardehaly and Aron Culotta Department of Computer Science Illinois Institute of Technology Chicago, IL 60616 emohamm1@hawk.iit.edu, aculotta@iit.edu
Pseudocode	Yes	Algorithm 1 Self-training for LLP.
Open Source Code	No	The paper does not provide an explicit statement or a link to the open-source code for the methodology described.
Open Datasets	Yes	Blog-2004: This corpus consists of 19,320 bloggers collected from blogger.com in August 2004 with around 35 posts per person [Schler et al., 2006].; Blog-2008: This corpus contains a collection of political blogs from 2008 [Eisenstein and Xing, 2010].; IMDB reviews: This corpus provides highly polar movie reviews [Maas et al., 2011].; 20 newsgroups: This corpus contains approximately 20,000 documents corresponding to 20 different newsgroups. ... We refer to this experiment as comp-sci. 2http://people.cs.umass.edu/ mccallum/code-data.html
Dataset Splits	No	The paper uses various datasets but does not explicitly provide specific percentages or sample counts for training, validation, and test splits for all of them. For instance, it mentions 'we only use 25K reviews in the testing set' for IMDB, but a general split strategy across all datasets or a dedicated validation set split is not detailed.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper mentions algorithms and optimizers like L-BFGS and LDA, but does not provide specific software dependencies with version numbers (e.g., Python, libraries, frameworks).
Experiment Setup	Yes	For simplicity, our experiments below set NT = 50 for all tasks. The number of top terms per topic to consider (Nf) places an upper bound on the total number of bags that will be created in the target data. We ﬁx Nf = 3 in all experiments below, limiting the model to at most 150 target bags. ... In the experiments below, we set the range of Ns 2 [5, 24] and Ni 2 [3, 7], resulting in 100 total models. ... to avoid overﬁtting we use early stopping by setting the maximum number of iterations to 12.