Domain Adaptation for Learning from Label Proportions Using Self-Training
Authors: Ehsan Mohammady Ardehaly, Aron Culotta
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on five diverse tasks indicate an 11% average absolute improvement in accuracy as compared to using LLP without domain adaptation. |
| Researcher Affiliation | Academia | Ehsan Mohammady Ardehaly and Aron Culotta Department of Computer Science Illinois Institute of Technology Chicago, IL 60616 emohamm1@hawk.iit.edu, aculotta@iit.edu |
| Pseudocode | Yes | Algorithm 1 Self-training for LLP. |
| Open Source Code | No | The paper does not provide an explicit statement or a link to the open-source code for the methodology described. |
| Open Datasets | Yes | Blog-2004: This corpus consists of 19,320 bloggers collected from blogger.com in August 2004 with around 35 posts per person [Schler et al., 2006].; Blog-2008: This corpus contains a collection of political blogs from 2008 [Eisenstein and Xing, 2010].; IMDB reviews: This corpus provides highly polar movie reviews [Maas et al., 2011].; 20 newsgroups: This corpus contains approximately 20,000 documents corresponding to 20 different newsgroups. ... We refer to this experiment as comp-sci. 2http://people.cs.umass.edu/ mccallum/code-data.html |
| Dataset Splits | No | The paper uses various datasets but does not explicitly provide specific percentages or sample counts for training, validation, and test splits for all of them. For instance, it mentions 'we only use 25K reviews in the testing set' for IMDB, but a general split strategy across all datasets or a dedicated validation set split is not detailed. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms and optimizers like L-BFGS and LDA, but does not provide specific software dependencies with version numbers (e.g., Python, libraries, frameworks). |
| Experiment Setup | Yes | For simplicity, our experiments below set NT = 50 for all tasks. The number of top terms per topic to consider (Nf) places an upper bound on the total number of bags that will be created in the target data. We fix Nf = 3 in all experiments below, limiting the model to at most 150 target bags. ... In the experiments below, we set the range of Ns 2 [5, 24] and Ni 2 [3, 7], resulting in 100 total models. ... to avoid overfitting we use early stopping by setting the maximum number of iterations to 12. |