A Multi-Pass Sieve for Name Normalization
Authors: Jennifer D'Souza
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that even in this task, the approach retains its characteristic features of being simple, and highly modular. In addition, it also proves robust when evaluated on two different kinds of data: clinical notes and biomedical text, by demonstrating high accuracy in normalizing disorder names found in both datasets. |
| Researcher Affiliation | Academia | Jennifer D Souza Human Language Technology Research Institute University of Texas at Dallas, Richardson, TX 75083-0688 jennifer.l.dsouza@utdallas.edu |
| Pseudocode | No | The paper describes the steps of its sieves in prose but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'Additional data used in the normalization system is available at http://www.hlt.utdallas.edu/ jld082000/normalization/'. This link refers to 'data', not open-source code for the methodology. There is no explicit statement about code release. |
| Open Datasets | Yes | We used the following corpora in our experiments: Clinical Notes The Sh ARe/CLEF e Health Challenge (Pradhan et al. 2013) corpus contained 199 notes for training and 99 notes for testing. Concept identifiers from training data and from the UMLS Metathesaurus (Campbell, Oliver, and Shortliffe 1998) were used for normalizing names from this corpus. Biomedical Abstracts The NCBI disease corpus (Do gan, Leaman, and Lu 2014) contained 693 abstracts for training and development, and 100 abstracts for testing. |
| Dataset Splits | Yes | The Sh ARe/CLEF e Health Challenge (Pradhan et al. 2013) corpus contained 199 notes for training and 99 notes for testing. ... The NCBI disease corpus (Do gan, Leaman, and Lu 2014) contained 693 abstracts for training and development, and 100 abstracts for testing. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using the 'Porter stemmer' and an algorithm by 'Schwartz and Hearst (2003)' but does not provide specific version numbers for any software or libraries used in the implementation. |
| Experiment Setup | No | The paper describes the logic of each sieve but does not provide specific experimental setup details such as hyperparameters, learning rates, batch sizes, or training schedules. |