Multi-Label Few-Shot ICD Coding as Autoregressive Generation with Prompt
Authors: Zhichao Yang, Sunjae Kwon, Zonghai Yao, Hong Yu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our Generation with Prompt (GPsoap) model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMICIII-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMICIII-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). |
| Researcher Affiliation | Academia | 1College of Information and Computer Sciences, University of Massachusetts Amherst 2Department of Computer Science, University of Massachusetts Lowell 3Center for Healthcare Organization and Implementation Research, Veterans Affairs Bedford Healthcare System zhichaoyang@umass.edu |
| Pseudocode | No | The paper describes the methods used but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codes are attached in supplementary material and will be publicly available upon publication. Our evaluation code is publicly available2. 2https://github.com/whaleloops/KEPT |
| Open Datasets | Yes | The fine-tuning dataset (Johnson et al. 2016) contains clinical data from real patients. It contains data instances of de-identified discharge summary note texts with expert-labeled ICD-9 codes. ... MIMIC-III, a freely accessible critical care database. (Johnson et al. 2016) |
| Dataset Splits | Yes | For all codes prediction tasks (MIMIC-III-full), we used the same splits as the previous work (Mullenbach et al. 2018; Yuan, Tan, and Huang 2022). |
| Hardware Specification | Yes | Pretraining on SOAP data took about 140 hours with 4 NVIDIA RTX 6000 GPU with 24 GB memory. Fine-tuning took about 40 hours with 4 NVIDIA RTX 6000 GPU with 24 GB memory. Our reranker training took about 12 hours with 2 NVIDIA A100 GPU with 40 GB memory. |
| Software Dependencies | No | The paper details hyper-parameters and training configurations such as learning rates and dropout rates, but it does not specify versions for software dependencies like programming languages or libraries (e.g., Python, PyTorch). |
| Experiment Setup | Yes | During pretraining, we used warmup ratio of 0.1, learning rate 5e 5, dropout rate 0.1, L2 weight decay 1e 3 and batch size of 64 with fp16. During fine-tuning, we grid searched learning rate [1e 5, 2e 5, 3e 5], dropout rate [0.1, 0.3, 0.5], with batch size of 4. Best hyper-parameters set is bolded. The random seed is 42. |