Graph Optimal Transport for Cross-Domain Alignment
Authors: Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, Jingjing Liu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show consistent outperformance of GOT over baselines across a wide range of tasks, including image-text retrieval, visual question answering, image captioning, machine translation, and text summarization. |
| Researcher Affiliation | Collaboration | 1Duke University 2Microsoft Dynamics 365 AI Research. Correspondence to: Liqun Chen <liqun.chen@duke.edu>, Zhe Gan <zhe.gan@microsoft.com>. |
| Pseudocode | Yes | Algorithm 1 Computing Wasserstein Distance. ... Algorithm 2 Computing Gromov-Wasserstein Distance. ... Algorithm 3 Computing GOT Distance. |
| Open Source Code | Yes | Code is available at https://github.com/Liqun Chen0606/Graph-Optimal-Transport. |
| Open Datasets | Yes | We evaluate our model on the Flickr30K (Plummer et al., 2015) and COCO (Lin et al., 2014) datasets. |
| Dataset Splits | Yes | We follow previous work (Karpathy & Fei Fei, 2015; Faghri et al., 2018) for the data split: 29,000, 1,000 and 1,000 images are used for training, validation and test, respectively. ... We follow the data split in Faghri et al. (2018), where 113,287, 5,000 and 5,000 images are used for training, validation and test, respectively. |
| Hardware Specification | No | The paper mentions 'when using the same machine for image-text retrieval experiments' but does not specify any particular hardware components (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper mentions 'Py Torch and Tensor Flow' for efficient implementation and 'Texar codebase' for experiments, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | A set of 36 features is created for each image, each feature represented by a 2048-dimensional vector. ... The text decoder is one-layer LSTM with 256 hidden units. The word embedding dimension is set to 256. ... We select λ from [0, 1]. |