reproducibilityindex.ai

Partial Optimal Tranport with applications on Positive-Unlabeled Learning

Authors: Laetitia Chapel, Mokhtar Z. Alaya, Gilles Gasso

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We showcase the new formulation in a positive-unlabeled (PU) learning application. To the best of our knowledge, this is the ﬁrst application of optimal transport in this context and we ﬁrst highlight that partial Wasserstein-based metrics prove effective in usual PU learning settings. We then demonstrate that partial Gromov Wasserstein metrics are efﬁcient in scenarii in which the samples from the positive and the unlabeled datasets come from different domains or have different features. and 5 Experiments
Researcher Affiliation	Academia	Laetitia Chapel Univ. Bretagne-Sud, CNRS, IRISA F-56000 Vannes laetitia.chapel@irisa.fr Mokhtar Z. Alaya LITIS EA4108 University of Rouen Normandy mokhtarzahdi.alaya@gmail.com Gilles Gasso LITIS EA4108 INSA, University of Rouen Normandy gilles.gasso@insa-rouen.fr
Pseudocode	Yes	Algorithm 1 Frank-Wolfe algorithm for partial-GW
Open Source Code	Yes	Algorithm 1 has been implemented and is avalaible on the Python Optimal Transport (POT) toolbox (Flamary and Courty, 2017).
Open Datasets	Yes	We rely on six datasets Mushrooms, Shuttle, Pageblocks, USPS, Connect-4, Spambase from the UCI repository1 (following Kato et al. (2019) s setting) and colored MNIST (Arjovsky et al., 2019) to illustrate our method in SCAR and SAR settings respectively. We also consider the Caltech office dataset, which is a common application of domain adaptation (Courty et al., 2017) to explore the effectiveness of our method on heterogeneous distribution settings. and 1https://archive.ics.uci.edu/ml/datasets.php
Dataset Splits	No	The paper describes how positive (Pos) and unlabeled (Unl) sets are drawn or formed for the PU learning problem (e.g., 'we randomly draw n P = 400 positive and n U = 800 unlabeled points'), but it does not specify explicit training, validation, or test dataset splits in the conventional sense for model development.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions the 'Python Optimal Transport (POT) toolbox' but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup	Yes	For both partial-W and partial-GW, we choose p = 2 and the cost matrices C are computed using Euclidean distance. ... We test 2 levels of noise in Pos: α = 0 or α = 0.025, ﬁx ξ = 0, A = max(C) and choose a large η = 106. ... We perform a PCA to project the data onto d = 10 dimensions for the SURF features and d = 40 for the DECAF ones.