Domain Adaptation for Learning from Label Proportions Using Self-Training

Authors: Ehsan Mohammady Ardehaly, Aron Culotta

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on five diverse tasks indicate an 11% average absolute improvement in accuracy as compared to using LLP without domain adaptation.
Researcher Affiliation Academia Ehsan Mohammady Ardehaly and Aron Culotta Department of Computer Science Illinois Institute of Technology Chicago, IL 60616 emohamm1@hawk.iit.edu, aculotta@iit.edu
Pseudocode Yes Algorithm 1 Self-training for LLP.
Open Source Code No The paper does not provide an explicit statement or a link to the open-source code for the methodology described.
Open Datasets Yes Blog-2004: This corpus consists of 19,320 bloggers collected from blogger.com in August 2004 with around 35 posts per person [Schler et al., 2006].; Blog-2008: This corpus contains a collection of political blogs from 2008 [Eisenstein and Xing, 2010].; IMDB reviews: This corpus provides highly polar movie reviews [Maas et al., 2011].; 20 newsgroups: This corpus contains approximately 20,000 documents corresponding to 20 different newsgroups. ... We refer to this experiment as comp-sci. 2http://people.cs.umass.edu/ mccallum/code-data.html
Dataset Splits No The paper uses various datasets but does not explicitly provide specific percentages or sample counts for training, validation, and test splits for all of them. For instance, it mentions 'we only use 25K reviews in the testing set' for IMDB, but a general split strategy across all datasets or a dedicated validation set split is not detailed.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions algorithms and optimizers like L-BFGS and LDA, but does not provide specific software dependencies with version numbers (e.g., Python, libraries, frameworks).
Experiment Setup Yes For simplicity, our experiments below set NT = 50 for all tasks. The number of top terms per topic to consider (Nf) places an upper bound on the total number of bags that will be created in the target data. We fix Nf = 3 in all experiments below, limiting the model to at most 150 target bags. ... In the experiments below, we set the range of Ns 2 [5, 24] and Ni 2 [3, 7], resulting in 100 total models. ... to avoid overfitting we use early stopping by setting the maximum number of iterations to 12.