A teacher-teacher framework for clinical language representation learning
Authors: Feiqing Huang, Shenghan Zhang, Sara Sweet, Tianxi Cai
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Validation and downstream tasks further demonstrate the effectiveness of the proposed framework. |
| Researcher Affiliation | Academia | Feiqing Huang Harvard T.H. Chan School of Public Health (...) Shenghan Zhang Harvard Medical School (...) Sara Morini Sweet Harvard Medical School (...) Tianxi Cai Harvard T.H. Chan School of Public Health Harvard Medical School |
| Pseudocode | No | The paper does not include a figure, block, or section explicitly labeled "Pseudocode" or "Algorithm" that details the structured steps of the entire method. |
| Open Source Code | No | Does the paper provide open access to the data and code...? Answer: [No] Justification: The code is not current available, but we will try to work on that in the near future. |
| Open Datasets | Yes | For our training dataset, we utilized 332K discharge notes and 2.2M radiology reports from 146K patients available in the MIMIC-IV database [10]. (...) We evaluated our model on two standard biomedical named entity recognition (NER) benchmark tasks: the i2b2 2006 de-identification challenge [30] and the i2b2 2014 de-identification challenge [26]. |
| Dataset Splits | Yes | We utilized a holdout subset of patients whose clinical notes were excluded from the training set. This subset comprised 100K radiology reports from 92K patients. (...) We followed the train/validation/test splits specified in the original challenges, as detailed in Table 1 of [2]. (...) The average performance metrics from five-fold cross-validation are reported in Table 4. |
| Hardware Specification | Yes | All experiments were conducted using an NVIDIA RTX 8000 GPU with 48GB of VRAM. |
| Software Dependencies | No | The paper mentions "tqdm function" but does not specify its version number or provide version numbers for any other key software components, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | We adopted the Adam optimizer with a learning rate of 10 3 and a batch size of 128. The training process was divided into two stages as detailed in Section 2.4: We first trained the model for three epochs initially, during which we identified residual concepts not adequately captured. Then, using the model checkpoint from the first phase, we recovered the residual concepts and refined the training pairs. The model was then trained for an additional two epochs. (...) A single linear layer was added on top of each model and trained for 20 epochs. We adopted Adam optimizer and the learning rates for CODER, BGE and LINE were set to 2 10 5, 2 10 5 and 5 5 10 4, respectively. |