Efficient Discrete Multi Marginal Optimal Transport Regularization

Authors: Ronak Mehta, Jeffery Kline, Vishnu Suresh Lokhande, Glenn Fung, Vikas Singh

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide technical details about this new regularization term and its properties, and we present experimental demonstrations of faster runtimes when compared to standard Wasserstein-style methods. Finally, on a range of experiments designed to assess effectiveness at enforcing fairness, we demonstrate our method compares well with alternatives.
Researcher Affiliation Collaboration Ronak Mehta UW-Madison Jeffery Kline Affirm Vishnu Suresh Lokhande UW-Madison Glenn Fung Liberty Mutual Vikas Singh UW-Madison
Pseudocode Yes Algorithm 1 Our proposed method: d Dimensional Earch Mover s Distance (DEMD)
Open Source Code Yes Code is available at https://github.com/ronakrm/demd.
Open Datasets Yes The datasets Adult, Communities and Crime, and German datasets are all available under Creative Commons Attribution 4.0 International (CC BY 4.0) licenses via the UCI Machine Learning Dataset Repository https://archive.ics.uci.edu/ml/index.php. The Celeb A dataset http://mmlab.ie.cuhk.edu.hk/projects/Celeb A.html is available for noncommercial purposes. See the website for more details. ACS Data. The American Census Survey (ACS) has recently made available a large set of demographic data. The original UCI Adult dataset Dua & Graff (2017) was curated from this data, however recent work by Ding et al. (2021) has identified temporal shifts in demographic data, and recommends using a more recent collection as a baseline when evaluating biases and adjusting for fairness. Part of their contribution includes APIs to directly interface with the data provided by the ACS, and the ability to identify and construct similar problems associated with the original UCI-provided dataset, albeit with updated data. Data for the income prediction task was downloaded from 2018, localized to Louisiana. Race is the provided group label, which we wish to be agnostic towards, over some measure of our output. Data was accessed using the folktables codebase https://github.com/zykls/folktables with MIT License. The US Census data accessed is available for use so long as it is not used in combination with other data to identify any particular respondent to a Census Bureau survey. See https://www.census.gov/data/developers/about/terms-of-service.html for more details.
Dataset Splits Yes The hyper-parameter selection has been done on a validation split obtained from the training dataset.
Hardware Specification Yes Experiments were conducted using Num Py and Py Torch on a Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz with an Nvidia Titan Xp GPU.
Software Dependencies No The paper mentions several software packages such as Num Py, Py Torch, Sci Py, and the Python Optimal Transport (POT) library, but does not specify their version numbers. For example: "Experiments were conducted using Num Py and Py Torch" and "Computation of the EMD is readily available, as in the Python Optimal Transport (POT) library (Flamary et al., 2021)."
Experiment Setup Yes All experiments related to fairness use a three-layer fully-connected neural network classifier with a hidden layer size of 100. Numbers reported in Table 1 in the main paper are means and standard deviations over three replicate runs with different seeds. Hyperparameters were selected as described in the main paper, by taking the best result for each dataset over the parameter range λ [1.0, 0.1, 10, 0.01, 100, 0.001].