Mixed Samples as Probes for Unsupervised Model Selection in Domain Adaptation

Authors: Dapeng Hu, Jian Liang, Jun Hao Liew, Chuhui Xue, Song Bai, Xinchao Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate Mix Val on 11 UDA methods across 4 adaptation settings, including classification and segmentation tasks. Experimental results consistently demonstrate that Mix Val achieves state-of-the-art performance and maintains exceptional stability in model selection.
Researcher Affiliation Collaboration Dapeng Hu1 Jian Liang3 Jun Hao Liew2 Chuhui Xue2 Song Bai2 Xinchao Wang1 1National University of Singapore 2Byte Dance Inc. 3CRIPAC & MAIS, Institute of Automation, Chinese Academy of Sciences
Pseudocode Yes The Py Torch-style pseudocode for our validation approach Mix Val is presented in Algorithm 1. # x: A batch of real target images with shuffled order. # lam: The mix ratio , a fixed scalar value between 0.5 and 1.0. # net: A trained UDA model in the evaluation mode. # model_list: A list containing candidate UDA models. # Calculate ICE scores for a mini -batch. def ice_score(x, lam , net): # Random batch index. rand_idx = torch.randperm(x.shape [0]) inputs_a = x inputs_b = x[rand_idx] # Obtain model predictions and hard pseudo labels. pred_a = net(inputs_a) pl_a = pred_a.max(dim =1)[1] pl_b = pl_a[rand_idx] # Intra -cluster mixup. same_idx = (pl_a == pl_b ). nonzero(as_tuple=True )[0] # Inter -cluster mixup. diff_idx = (pl_a != pl_b ). nonzero(as_tuple=True )[0] # Mixup with images and hard pseudo labels. mix_inputs = lam * inputs_a + (1 lam) * inputs_b if lam > 0.5: mix_labels = pl_a else: mix_labels = pl_b # Obtain predictions for the mixed samples. mix_pred = net(mix_inputs) mix_pred_labels = mix_pred.max(dim =1)[1] # Calculate ICE scores for two -dimensional probing. ice_same = torch.sum( mix_pred_labels [same_idx] == mix_labels[same_idx ]) / same_idx.shape [0] ice_diff = torch.sum( mix_pred_labels [diff_idx] == mix_labels[diff_idx ]) / diff_idx.shape [0] return ice_same , ice_diff # Perform model selection based on ICE scores. def mix Val(model_list , x, lam): # Calculate ICE scores for all candidate models. ice_same_list = [] ice_diff_list = [] for net in model_list: ice_same , ice_diff = ice_score(x, lam , net) ice_same_list.append(ice_same) ice_diff_list.append(ice_diff) # Calculate the average rank of two types of ICE scores. ice_same_list = torch.tensor(ice_same_list) ice_diff_list = torch.tensor(ice_diff_list) ice_same_rank = torch.argsort(torch.argsort(ice_same_list )) ice_diff_rank = torch.argsort(torch.argsort(ice_diff_list )) average_rank = (ice_same_rank + ice_diff_rank) / 2 # Choose the model with the highest average rank. return model_list[torch.argmax(average_rank )]
Open Source Code Yes Code is available at https://github.com/LHXXHB/Mix Val.
Open Datasets Yes For image classification tasks, we consider 4 popular UDA benchmarks of varied scales. Office-31 [55] is a classic domain adaptation benchmark consisting of 31 object categories across 3 domains: Amazon (A), DSLR (D), and Webcam (W). Office-Home [56] is a challenging benchmark with 65 different object categories in 4 domains: Art (Ar), Clipart (Cl), Product (Pr), and Real-World (Re). Vis DA [57] is a large-scale benchmark for the synthetic-to-real object recognition task, featuring 12 categories. It consists of a training (T) domain with 152k synthetic images and a validation (V) domain with 55k realistic images. Domain Net [58] is a recent large-scale benchmark comprising approximately 600k images across 345 categories in 6 distinct domains. For evaluation, we focus on a subset of 126 classes with 7 tasks [59] from 4 domains: Real (R), Clipart (C), Painting (P), and Sketch (S). For semantic segmentation tasks, we use the synthetic GTAV [60] dataset as the source domain and the real-world Cityscapes [61] dataset as the target domain.
Dataset Splits Yes For source-based methods, we split 80% of the source data as the training set and the remaining 20% as the validation set.
Hardware Specification Yes We utilize the Transfer Learning Library to train UDA models on a single RTX TITAN 16GB GPU.
Software Dependencies No The paper mentions using the 'Transfer Learning Library' and links to its GitHub, but it does not specify version numbers for any software dependencies like Python, PyTorch, or other libraries used in the experiments.
Experiment Setup Yes The batch size is set to 32, and the total number of iterations is set to 5,000. ... For ATDOC, BNM, CDAN, PADA, SAFN, SHOT, DMRL, and DANCE, we select the loss coefficient among 7 varied candidate values. For MDD, we validate the margin factor, while for MCC, we validate the temperature. ... For the validation of semantic segmentation tasks, we also consider two hyperparameters with the training iteration as an additional hyperparameter selected from the set {10,000, 12,000, 14,000, 16,000, 18,000, 20,000, 22,000, 24,000, 26,000, 28,000, 30,000}. Detailed hyperparameter settings are available in Table 11.