Diffusion Earth Mover’s Distance and Distribution Embeddings
Authors: Alexander Y Tong, Guillaume Huguet, Amine Natik, Kincaid Macdonald, Manik Kuchroo, Ronald Coifman, Guy Wolf, Smita Krishnaswamy
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first evaluate the Diffusion EMD on two manifolds where the ground truth EMD with a geodesic ground distance is known, a swiss roll dataset and spherical MNIST (Cohen et al., 2017). On these datasets where we have access to the ground truth geodesic distance we show that Diffusion EMD is both faster and closer to the ground truth than comparable methods. Then, we show an application to a large single cell dataset of COVID-19 patients where the underlying metric between cells is thought to be a manifold (Moon et al., 2018; Kuchroo et al., 2020). We show that the manifold of patients based on Diffusion EMD by capturing the graph structure, better captures the disease state of the patients. |
| Researcher Affiliation | Academia | 1Dept. of Comp. Sci., Yale University, New Haven, CT, USA 2Dept. of Math. & Stat., Universit e de Montr eal, Montr eal, QC, Canada 3Mila Quebec AI Institute, Montr eal, QC, Canada 4Dept. of Math., Yale University, New Haven, CT, USA 5Department of Genetics, Yale University, New Haven, CT, USA. |
| Pseudocode | Yes | Algorithm 1 Chebyshev embedding |
| Open Source Code | Yes | Python implementation is available at https://github. com/Krishnaswamy Lab/Diffusion EMD. |
| Open Datasets | Yes | We first evaluate the Diffusion EMD on two manifolds where the ground truth EMD with a geodesic ground distance is known, a swiss roll dataset and spherical MNIST (Cohen et al., 2017). And "We analyzed 210 blood samples from 168 patients infected with SARS-CoV-2 measured on a myeloid-specific flow cytometry panel, an expanded iteration of a previously published dataset (Lucas et al., 2020). |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits with percentages or counts, or refer to predefined splits with citations for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Python implementation' but does not provide specific ancillary software details with version numbers (e.g., libraries, frameworks, or specific Python version) needed to replicate the experiment. |
| Experiment Setup | No | We search over and fix other parameters using a grid search as detailed in Sec. D of the Appendix. |