Distilled Wasserstein Learning for Word Embedding and Topic Modeling

Authors: Hongteng Xu, Wenlin Wang, Wei Liu, Lawrence Carin

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our approach is superior to previous state-of-the-art methods in various tasks, including predicting admission type, mortality of a given admission, and procedure recommendation. To demonstrate the feasibility and the superiority of our distilled Wasserstein learning (DWL) method, we apply it to analysis of admission records of patients, and compare it with state-of-the-art methods. We consider a subset of the MIMIC-III dataset [25], containing 11, 086 patient admissions, corresponding to 56 diseases and 25 procedures, and each admission is represented as a sequence of ICD codes of the diseases and the procedures.
Researcher Affiliation Collaboration Hongteng Xu1,2 Wenlin Wang2 Wei Liu3 Lawrence Carin2 1Infinia ML, Inc. 2Duke University 3Tencent AI Lab
Pseudocode Yes Algorithm 1: Distilled Wasserstein Learning (DWL) for Joint Word Embedding and Topic Modeling
Open Source Code No The paper does not provide a direct link or explicit statement about the release of its own source code. It only mentions 'Morgan A. Schmitz kindly helped us by sharing his Wasserstein dictionary learning code.'
Open Datasets Yes We consider a subset of the MIMIC-III dataset [25]
Dataset Splits Yes For all the methods, we use 50% of the admissions for training, 25% for validation, and the remaining 25% for testing in each task.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments.
Software Dependencies No The paper mentions various software components and methods (e.g., 'Word2Vec', 'Glove', 'Sinkhorn distance') but does not provide specific version numbers for any of them.
Experiment Setup Yes The hyperparameters of our method are set via cross validation: the batch size s = 256, β = 0.01, ϵ = 0.01, the number of topics K = 8, the embedding dimension D = 50, and the learning rate ρ = 0.05. The number of epochs I is set to be 5 when the embeddings are initialized by Word2Vec, and 50 when training from scratch. The distillation parameter is τ = 0.5 empirically