Statistical Optimal Transport posed as Learning Kernel Embedding
Authors: Saketha Nath Jagarlapudi, Pratik Kumar Jawanpuria
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results illustrate the efficacy of the proposed approach. Our work focuses on this challenging and important problem of statistical OT over continuous domains, and seeks consistent estimators for ϵ-optimal transport plan/map, whose sample complexity is dimension-free. To this end, we take the novel approach of equivalently re-formulating the statistical OT problem solely in terms of the relevant kernel mean embeddings [26]. More specifically, our formulation finds the (characterizing) kernel mean embedding of a joint distribution with least expected cost, and whose marginal embeddings are close to the given-sample based estimates of the marginal embeddings. |
| Researcher Affiliation | Collaboration | J. Saketha Nath Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, INDIA. saketha@cse.iith.ac.in Pratik Jawanpuria Microsoft IDC, Hyderabad, INDIA. pratik.jawanpuria@microsoft.com |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://www.iith.ac.in/~saketha/research.html. Additional details on the experiments are available in the technical report [28]. |
| Open Datasets | Yes | We experiment on the Caltech-Office dataset [17], which contains images from four domains: Amazon (online retail), the Caltech image dataset, DSLR (images taken from a high resolution DSLR camera), and Webcam (images taken from a webcam). |
| Dataset Splits | No | The paper mentions partitioning the target domain into training and test sets but does not specify a validation set or its split details. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models or memory used for experiments. |
| Software Dependencies | No | We use the Python Optimal Transport (POT) library (https://github.com/Python OT/POT) implementations of OTLin and OTKer in our experiments. While a software library is mentioned, specific version numbers are not provided. |
| Experiment Setup | Yes | Experimental setup: We consider mean zero Gaussian distributions with unit-trace covariances and sample equal number (m) of source and target data points, where m {10, 20, 50, 100, 150, 200} and d {100, 1000}. The covariance matrices are computed as Σ1 = V1V 1 / V1 F and Σ2 = V2V 2 / V2 F , where V1 Rd d and V2 Rd d are generated randomly from the uniform distribution. Our approach employs the Gaussian kernels, k(x, z) = exp( x z 2/2σ2). For learning transport plan, we randomly select ten images per class for the source domain (eight per class when DSLR is the source, due to its sample size). The remaining samples of the source domain is marked as out-of-sample source data-points. The target domain is partitioned equally into training and test sets. The transport map is learned using the source-target training sets. The classification in the target domain is performed using a 1-Nearest Neighbor classifier [17, 31, 6]. We employ De CAF6 features to represent the images [10, 31, 6]. |