Generalized Zero-Shot Text Classification for ICD Coding

Authors: Congzheng Song, Shanghang Zhang, Najmeh Sadoughi, Pengtao Xie, Eric Xing

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of our approach. On the public MIMIC-III dataset, our methods improve the F1 score from nearly 0 to 20.91% for the zero-shot codes, and increase the AUC score by 3% (absolute improvement) from previous state of the art.
Researcher Affiliation Collaboration Congzheng Song1 , Shanghang Zhang2 , Najmeh Sadoughi3 , Pengtao Xie3 and Eric Xing3 1Cornell University 2University of California, Berkeley 3Petuum Inc.
Pseudocode No The paper describes its models and methods using textual descriptions and mathematical equations (e.g., equations 1-9), but it does not include a formally structured pseudocode or algorithm block.
Open Source Code Yes Code is available at https://github.com/csong27/gzsl text.
Open Datasets Yes We evaluate our approach using the public medical dataset MIMIC-III [Johnson et al., 2016]
Dataset Splits Yes We split the dataset for training, validation, and testing by patient ID. In total we have 46,157 discharge summaries for training, 3,280 for validation and 3,285 for testing.
Hardware Specification No The paper describes the experimental setup, including parameters and training details, but it does not specify the hardware used for running the experiments (e.g., specific GPU or CPU models, memory, or cluster configurations).
Software Dependencies No The paper mentions using ADAM for optimization [Kingma and Ba, 2015] and references specific models like ZAGRNN and Transformer architecture [Vaswani et al., 2017], but it does not provide specific version numbers for software libraries, programming languages (e.g., Python), or frameworks (e.g., TensorFlow, PyTorch) necessary for reproducibility.
Experiment Setup Yes For the ZAGRNN model, we use 100 convolution filters with a filter size of 5. We set C = 2 in LLDAM. We use ADAM [Kingma and Ba, 2015] for optimization with batch size 8 and learning rate 0.001. The final feature size and GRNN hidden layer size are both set to 400. We train the ZAGRNN model for 40 epochs. For WGAN-GP based methods... We set gradient penalty coefficient λ = 10. For the code-description encoder LSTM, we set the hidden size to 200. We train the discriminator 5 iterations per each generator training iteration. We optimize the WGAN-GP with ADAM [Kingma and Ba, 2015] with mini-batch size 128 and learning rate 0.0001. We train all variants of WGAN-GP for 60 epochs. We set the weight of LCLS to 0.01 and LCYC, LKEY to 0.1. For LKEY, we predict the top 30 most relevant keywords given the generated features. After the generators are trained, we synthesize 256 features for each zero-shot code l and fine-tune the classifier gl using ADAM and set the learning rate to 0.00001 and the batch size to 128.