A Multi-Pass Sieve for Name Normalization

Authors: Jennifer D'Souza

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We find that even in this task, the approach retains its characteristic features of being simple, and highly modular. In addition, it also proves robust when evaluated on two different kinds of data: clinical notes and biomedical text, by demonstrating high accuracy in normalizing disorder names found in both datasets.
Researcher Affiliation Academia Jennifer D Souza Human Language Technology Research Institute University of Texas at Dallas, Richardson, TX 75083-0688 jennifer.l.dsouza@utdallas.edu
Pseudocode No The paper describes the steps of its sieves in prose but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper states: 'Additional data used in the normalization system is available at http://www.hlt.utdallas.edu/ jld082000/normalization/'. This link refers to 'data', not open-source code for the methodology. There is no explicit statement about code release.
Open Datasets Yes We used the following corpora in our experiments: Clinical Notes The Sh ARe/CLEF e Health Challenge (Pradhan et al. 2013) corpus contained 199 notes for training and 99 notes for testing. Concept identifiers from training data and from the UMLS Metathesaurus (Campbell, Oliver, and Shortliffe 1998) were used for normalizing names from this corpus. Biomedical Abstracts The NCBI disease corpus (Do gan, Leaman, and Lu 2014) contained 693 abstracts for training and development, and 100 abstracts for testing.
Dataset Splits Yes The Sh ARe/CLEF e Health Challenge (Pradhan et al. 2013) corpus contained 199 notes for training and 99 notes for testing. ... The NCBI disease corpus (Do gan, Leaman, and Lu 2014) contained 693 abstracts for training and development, and 100 abstracts for testing.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions using the 'Porter stemmer' and an algorithm by 'Schwartz and Hearst (2003)' but does not provide specific version numbers for any software or libraries used in the implementation.
Experiment Setup No The paper describes the logic of each sieve but does not provide specific experimental setup details such as hyperparameters, learning rates, batch sizes, or training schedules.