Distilled Wasserstein Learning for Word Embedding and Topic Modeling
Authors: Hongteng Xu, Wenlin Wang, Wei Liu, Lawrence Carin
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our approach is superior to previous state-of-the-art methods in various tasks, including predicting admission type, mortality of a given admission, and procedure recommendation. To demonstrate the feasibility and the superiority of our distilled Wasserstein learning (DWL) method, we apply it to analysis of admission records of patients, and compare it with state-of-the-art methods. We consider a subset of the MIMIC-III dataset [25], containing 11, 086 patient admissions, corresponding to 56 diseases and 25 procedures, and each admission is represented as a sequence of ICD codes of the diseases and the procedures. |
| Researcher Affiliation | Collaboration | Hongteng Xu1,2 Wenlin Wang2 Wei Liu3 Lawrence Carin2 1Infinia ML, Inc. 2Duke University 3Tencent AI Lab |
| Pseudocode | Yes | Algorithm 1: Distilled Wasserstein Learning (DWL) for Joint Word Embedding and Topic Modeling |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the release of its own source code. It only mentions 'Morgan A. Schmitz kindly helped us by sharing his Wasserstein dictionary learning code.' |
| Open Datasets | Yes | We consider a subset of the MIMIC-III dataset [25] |
| Dataset Splits | Yes | For all the methods, we use 50% of the admissions for training, 25% for validation, and the remaining 25% for testing in each task. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments. |
| Software Dependencies | No | The paper mentions various software components and methods (e.g., 'Word2Vec', 'Glove', 'Sinkhorn distance') but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The hyperparameters of our method are set via cross validation: the batch size s = 256, β = 0.01, ϵ = 0.01, the number of topics K = 8, the embedding dimension D = 50, and the learning rate ρ = 0.05. The number of epochs I is set to be 5 when the embeddings are initialized by Word2Vec, and 50 when training from scratch. The distillation parameter is τ = 0.5 empirically |