reproducibilityindex.ai

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Authors: Kilian Fatras, Thibault Sejourne, Rémi Flamary, Nicolas Courty

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to signiﬁcantly better results, competing with or surpassing recent baselines. Finally, we design a new domain adaptation (DA) method whose performances are evaluated on several problems, where we show evidences that our strategy surpasses substantially other classical OT formulations, and is on par or better than recent state-of-the-art competitors.
Researcher Affiliation	Academia	1Univ. Bretagne-Sud, CNRS, INRIA, IRISA, France 2ENS, PSL University 3 Ecole Polytechnique, CMAP, France.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The experiments were designed in Py Torch (Paszke et al., 2017) and all the code can be found here. https://github.com/kilian Fatras/JUMBOT
Open Datasets	Yes	Datasets. We start with digits datasets. Following the evaluation protocol of (Damodaran et al., 2018) we experiment on three adaptation scenarios: USPS to MNIST (U7 M), MNIST to M-MNIST (M7 MM), and SVHN to MNIST (S7 M). MNIST (Le Cun & Cortes, 2010) contains 60,000 images of handwritten digits, M-MNIST contains the 60,000 MNIST images with color patches (Ganin et al., 2016) and USPS (Hull, 1994) contains 7,291 images. Street View House Numbers (SVHN)(Netzer et al., 2011) consists of 73, 257 images with digits and numbers in natural scenes. We report the evaluation results on the test target datasets. Ofﬁce Home (Venkateswara et al., 2017) is a difﬁcult dataset for unsupervised domain adaptation (UDA), it has 15,500 images from four different domains: Artistic images (A), Clip Art (C), Product images (P) and Real-World (R). For each domain, the dataset contains images of 65 object categories that are common in ofﬁce and home scenarios. We evaluate all methods in 12 adaptation scenarios. Vis DA-2017 (Peng et al., 2017) is a large-scale dataset for UDA from simulation to real. Vis DA contains 152,397 synthetic images as the source domain and 55,388 real-world images as the target domain.
Dataset Splits	Yes	Vis DA-2017 (Peng et al., 2017) is a large-scale dataset for UDA from simulation to real. Vis DA contains 152,397 synthetic images as the source domain and 55,388 real-world images as the target domain. 12 object categories are shared by these two domains. Following (Long et al., 2018; Chen et al., 2020), we evaluate all methods on Vis DA validation set.
Hardware Specification	No	The paper mentions that experiments were designed in PyTorch but does not provide any specific details about the hardware used, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions the use of 'Py Torch', 'POT package', and 'Geomloss package' but does not specify any version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	The gradient ﬂow algorithm uses a simple explicit Euler integration scheme. Formally, it starts from an initial distribution at time t = 0 and integrates at each iteration a SDE. In our case, we cannot compute the gradient directly from our minibatch OT losses. As the OT loss inputs are empirical distributions, we have an inherent bias when we calculate the gradient from the weights 1 m of samples that we correct by multiplying the gradient by the inverse weight m. Finally, we integrate: X(t) = m Xehm k (X, Y ). For α and β(0) we generate 10000 2D points divided in 2 imbalanced clusters with number of samples in each cluster provided in Figure 4. We consider the (unbalanced) sinkhorn divergence, a squared euclidean cost, a learning rate of 0.02, 5000 iterations, m equals 64 or 128 and k = 1.