reproducibilityindex.ai

Source Free Transfer Learning for Text Classification

Authors: Zhongqi Lu, Yin Zhu, Sinno Pan, Evan Xiang, Yujing Wang, Qiang Yang

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on 20News Group dataset and a Google search snippets dataset suggest that the framework is capable of achieving comparable performance to those state-of-the-art methods with dedicated selections of auxiliary data.
Researcher Affiliation	Collaboration	Hong Kong University of Science and Technology, Hong Kong Institute for Infocomm Research, Singapore 138632 Baidu Inc., China \Microsoft Research Asia, Beijing, China ]Huawei Noah s Ark Lab, Hong Kong
Pseudocode	Yes	Algorithm 1 Source Free Transfer Learning.
Open Source Code	No	The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We perform text classiﬁcation tasks on two datasets: the 20Newsgroups dataset2 (20NG) and a Google snippets dataset (GOOGLE) (Phan, Nguyen, and Horiguchi 2008). 2http://people.csail.mit.edu/jrennie/20Newsgroups
Dataset Splits	Yes	We validate the model to be learned on a validation set V = {(x1, y1), . . . , (xn, yn)} of the target domain.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running experiments, such as GPU/CPU models, memory, or specific computing environments.
Software Dependencies	No	The paper mentions general software or methods like 'search engine, such as Ask.com', 'Linear support vector machine', and 'logistic regression', but does not provide specific software names with version numbers (e.g., 'Python 3.8, Scikit-learn 0.24').
Experiment Setup	Yes	In our experiments, we randomly choose only 10 target domain training samples for each task. For both Semi Log Reg and Tr Ada Boost methods, we choose about 80 samples from the original training datasets as the auxiliary data for each task. In the experiments, we varied the number of training samples from 10 to 90 for each of the tasks. The number of auxiliary samples is limited to 50...λ is the trade-off parameter that controls the effect of unlabeled auxiliary data...we propose to set an empirical bound, where the change of performance approaches an inﬁnite small number.