Source Free Transfer Learning for Text Classification
Authors: Zhongqi Lu, Yin Zhu, Sinno Pan, Evan Xiang, Yujing Wang, Qiang Yang
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on 20News Group dataset and a Google search snippets dataset suggest that the framework is capable of achieving comparable performance to those state-of-the-art methods with dedicated selections of auxiliary data. |
| Researcher Affiliation | Collaboration | Hong Kong University of Science and Technology, Hong Kong Institute for Infocomm Research, Singapore 138632 Baidu Inc., China \Microsoft Research Asia, Beijing, China ]Huawei Noah s Ark Lab, Hong Kong |
| Pseudocode | Yes | Algorithm 1 Source Free Transfer Learning. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We perform text classification tasks on two datasets: the 20Newsgroups dataset2 (20NG) and a Google snippets dataset (GOOGLE) (Phan, Nguyen, and Horiguchi 2008). 2http://people.csail.mit.edu/jrennie/20Newsgroups |
| Dataset Splits | Yes | We validate the model to be learned on a validation set V = {(x1, y1), . . . , (xn, yn)} of the target domain. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running experiments, such as GPU/CPU models, memory, or specific computing environments. |
| Software Dependencies | No | The paper mentions general software or methods like 'search engine, such as Ask.com', 'Linear support vector machine', and 'logistic regression', but does not provide specific software names with version numbers (e.g., 'Python 3.8, Scikit-learn 0.24'). |
| Experiment Setup | Yes | In our experiments, we randomly choose only 10 target domain training samples for each task. For both Semi Log Reg and Tr Ada Boost methods, we choose about 80 samples from the original training datasets as the auxiliary data for each task. In the experiments, we varied the number of training samples from 10 to 90 for each of the tasks. The number of auxiliary samples is limited to 50...λ is the trade-off parameter that controls the effect of unlabeled auxiliary data...we propose to set an empirical bound, where the change of performance approaches an infinite small number. |