Cycle Self-Training for Domain Adaptation

Authors: Hong Liu, Jianmin Wang, Mingsheng Long

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze CST theoretically under realistic assumptions, and provide hard cases where CST recovers target ground truth, while both invariant feature learning and vanilla self-training fail. Empirical results indicate that CST significantly improves over the state-of-the-arts on visual recognition and sentiment analysis benchmarks.
Researcher Affiliation Academia Hong Liu Dept of Electronic Engineering Tsinghua University hongliu9903@gmail.com Jianmin Wang School of Software, BNRist Tsinghua University jimwang@tsinghua.edu.cn Mingsheng Long School of Software, BNRist Tsinghua University mingsheng@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Cycle Self-Training (CST)
Open Source Code Yes Code is available at https://github.com/Liuhong99/CST.
Open Datasets Yes Datasets. We experiment on visual object recognition and linguistic sentiment classification tasks: Office-Home [64] has 65 classes from four kinds of environment with large domain gap: Artistic (Ar), Clip Art (Cl), Product (Pr), and Real-World (Rw); Vis DA-2017 [45] is a large-scale UDA dataset with two domains named Synthetic and Real. The datasets consist of over 200k images from 12 categories of objects; Amazon Review [10] is a linguistic sentiment classification dataset of product reviews in four products: Books (B), DVDs (D), Electronics (E), and Kitchen (K).
Dataset Splits No The paper mentions running tasks multiple times and reporting mean/deviation, and using a learning rate decay schedule, but does not provide explicit train/validation/test split percentages, sample counts, or refer to predefined splits with specific details that would allow for reproduction of data partitioning beyond the general UDA problem setup.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, specific libraries).
Experiment Setup Yes Implementation. We use Res Net-50 [26] (pretrained on Image Net [53]) as feature extractors for vision tasks, and BERT [16] for linguistic tasks. On Vis DA-2017, we also provide results of Res Net101 to include more baselines. We use cross-entropy loss for classification on the source domain. When training the target head ˆ t and updating the feature extractor with CST, we use squared loss to get the analytical solution of ˆ t directly and avoid calculating second order derivatives as metalearning [18]. Details on adapting squared loss to multi-class classification are deferred to Appendix B. We adopt SGD with initial learning rate 0 = 2e 3 for image classification and 0 = 5e 4 for sentiment classification. Following standard protocol in [26], we decay the learning rate by 0.1 each 50 epochs until 150 epochs. We run all the tasks 3 times and report mean and deviation in top-1 accuracy.