Cycle Self-Training for Domain Adaptation
Authors: Hong Liu, Jianmin Wang, Mingsheng Long
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze CST theoretically under realistic assumptions, and provide hard cases where CST recovers target ground truth, while both invariant feature learning and vanilla self-training fail. Empirical results indicate that CST significantly improves over the state-of-the-arts on visual recognition and sentiment analysis benchmarks. |
| Researcher Affiliation | Academia | Hong Liu Dept of Electronic Engineering Tsinghua University hongliu9903@gmail.com Jianmin Wang School of Software, BNRist Tsinghua University jimwang@tsinghua.edu.cn Mingsheng Long School of Software, BNRist Tsinghua University mingsheng@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Cycle Self-Training (CST) |
| Open Source Code | Yes | Code is available at https://github.com/Liuhong99/CST. |
| Open Datasets | Yes | Datasets. We experiment on visual object recognition and linguistic sentiment classification tasks: Office-Home [64] has 65 classes from four kinds of environment with large domain gap: Artistic (Ar), Clip Art (Cl), Product (Pr), and Real-World (Rw); Vis DA-2017 [45] is a large-scale UDA dataset with two domains named Synthetic and Real. The datasets consist of over 200k images from 12 categories of objects; Amazon Review [10] is a linguistic sentiment classification dataset of product reviews in four products: Books (B), DVDs (D), Electronics (E), and Kitchen (K). |
| Dataset Splits | No | The paper mentions running tasks multiple times and reporting mean/deviation, and using a learning rate decay schedule, but does not provide explicit train/validation/test split percentages, sample counts, or refer to predefined splits with specific details that would allow for reproduction of data partitioning beyond the general UDA problem setup. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, specific libraries). |
| Experiment Setup | Yes | Implementation. We use Res Net-50 [26] (pretrained on Image Net [53]) as feature extractors for vision tasks, and BERT [16] for linguistic tasks. On Vis DA-2017, we also provide results of Res Net101 to include more baselines. We use cross-entropy loss for classification on the source domain. When training the target head ˆ t and updating the feature extractor with CST, we use squared loss to get the analytical solution of ˆ t directly and avoid calculating second order derivatives as metalearning [18]. Details on adapting squared loss to multi-class classification are deferred to Appendix B. We adopt SGD with initial learning rate 0 = 2e 3 for image classification and 0 = 5e 4 for sentiment classification. Following standard protocol in [26], we decay the learning rate by 0.1 each 50 epochs until 150 epochs. We run all the tasks 3 times and report mean and deviation in top-1 accuracy. |