reproducibilityindex.ai

Statistical Optimal Transport posed as Learning Kernel Embedding

Authors: Saketha Nath Jagarlapudi, Pratik Kumar Jawanpuria

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results illustrate the efﬁcacy of the proposed approach. Our work focuses on this challenging and important problem of statistical OT over continuous domains, and seeks consistent estimators for ϵ-optimal transport plan/map, whose sample complexity is dimension-free. To this end, we take the novel approach of equivalently re-formulating the statistical OT problem solely in terms of the relevant kernel mean embeddings [26]. More speciﬁcally, our formulation ﬁnds the (characterizing) kernel mean embedding of a joint distribution with least expected cost, and whose marginal embeddings are close to the given-sample based estimates of the marginal embeddings.
Researcher Affiliation	Collaboration	J. Saketha Nath Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, INDIA. saketha@cse.iith.ac.in Pratik Jawanpuria Microsoft IDC, Hyderabad, INDIA. pratik.jawanpuria@microsoft.com
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://www.iith.ac.in/~saketha/research.html. Additional details on the experiments are available in the technical report [28].
Open Datasets	Yes	We experiment on the Caltech-Ofﬁce dataset [17], which contains images from four domains: Amazon (online retail), the Caltech image dataset, DSLR (images taken from a high resolution DSLR camera), and Webcam (images taken from a webcam).
Dataset Splits	No	The paper mentions partitioning the target domain into training and test sets but does not specify a validation set or its split details.
Hardware Specification	No	The paper does not specify any hardware details such as GPU/CPU models or memory used for experiments.
Software Dependencies	No	We use the Python Optimal Transport (POT) library (https://github.com/Python OT/POT) implementations of OTLin and OTKer in our experiments. While a software library is mentioned, specific version numbers are not provided.
Experiment Setup	Yes	Experimental setup: We consider mean zero Gaussian distributions with unit-trace covariances and sample equal number (m) of source and target data points, where m {10, 20, 50, 100, 150, 200} and d {100, 1000}. The covariance matrices are computed as Σ1 = V1V 1 / V1 F and Σ2 = V2V 2 / V2 F , where V1 Rd d and V2 Rd d are generated randomly from the uniform distribution. Our approach employs the Gaussian kernels, k(x, z) = exp( x z 2/2σ2). For learning transport plan, we randomly select ten images per class for the source domain (eight per class when DSLR is the source, due to its sample size). The remaining samples of the source domain is marked as out-of-sample source data-points. The target domain is partitioned equally into training and test sets. The transport map is learned using the source-target training sets. The classiﬁcation in the target domain is performed using a 1-Nearest Neighbor classiﬁer [17, 31, 6]. We employ De CAF6 features to represent the images [10, 31, 6].