Linear Time Sinkhorn Divergences using Positive Features

Authors: Meyer Scetbon, Marco Cuturi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figures 1,3 we plot the deviation from ground truth, defined as D := 100 ROT d ROT |ROT| + 100, and show the time-accuracy tradeoff for our proposed method RF, Nystrom Nys [3] and Sinkhorn Sin [16], for a range of regularization parameters
Researcher Affiliation Collaboration Meyer Scetbon CREST, ENSAE, Institut Polytechnique de Paris, meyer.scetbon@ensae.fr Marco Cuturi Google Brain, CREST, ENSAE, cuturi@google.com
Pseudocode Yes Algorithm 1 Sinkhorn Inputs: K, a, b, δ, u repeat v b/KT u, u a/Kv until kv KT u bk1 < δ; Result: u, v
Open Source Code Yes The code is available at github.com/meyerscetbon/Linear Sinkhorn.
Open Datasets Yes We train our GAN models on a Tesla K80 GPU for 84 hours on two different datasets, namely CIFAR-10 dataset [35] and Celeb A dataset [38]
Dataset Splits No The paper uses datasets like CIFAR-10 and Celeb A but does not specify the exact training, validation, and test splits (e.g., percentages or sample counts) used for reproduction.
Hardware Specification Yes We train our GAN models on a Tesla K80 GPU for 84 hours on two different datasets, namely CIFAR-10 dataset [35] and Celeb A dataset [38]
Software Dependencies No The paper mentions the use of neural networks and GANs, implying common deep learning frameworks, but does not provide specific version numbers for software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup Yes More precisely we take the exact same functions used in [46, 36] to define g and fγ. Moreover, ' is the feature map associated to the Gaussian kernel defined in Lemma 1 where is initialised with a normal distribution. The number of random features considered has been fixed to be r = 600 in the following. The training procedure is the same as [27, 36] and consists in alterning nc optimisation steps to train the cost function c hγ and an optimisation step to train the generator g . (...) where we set the number of batches s = 7000, the regularization = 1, and the number of features r = 600.