Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

End-to-End Bootstrapping Neural Network for Entity Set Expansion

Authors: Lingyong Yan, Xianpei Han, Ben He, Le Sun9402-9409

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate substantial improvement of our model over previous ESE approaches.
Researcher Affiliation Academia 1 Chinese Information Processing Laboratory, 2 State Key Laboratory of Computer Science Institute of Software, Chinese Academy of Sciences, Beijing, China 3 University of Chinese Academy of Sciences, Beijing, China EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Optimization Algorithm
Open Source Code Yes Source code is available online5. 5https://github.com/lingyongyan/bootstrapnet
Open Datasets Yes Datasets: We use two datasets, Co NLL and Onto Notes, constructed by Zupon et al. (2019). Co NLL is constructed from the Co NLL 2003 shared task dataset (Tjong Kim Sang and De Meulder 2003), which contains 4 entity types. Onto Notes is constructed from the Onto Notes datasets (Pradhan et al. 2013) without numerical categories, which finally contains 11 entity types. Zupon et al. (2019) use the n-grams of the size up to 4 tokens on either side of an entity as the patterns and filter out some patterns.
Dataset Splits Yes To learn our model, we randomly select other 30 entities per category with their labels from each dataset as the development set, and leave the remaining entities as the test set.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions 'scikit-learn package' without a specific version number and does not provide other key software components with their versions.
Experiment Setup Yes For all baselines and our model, we manually select 10 seeds per category with the highest frequency in the datasets and run them for 20 bootstrapping iterations. At each bootstrapping iteration, we add 10 entities and 10 patterns to each category. The number of layers in Bootstrap Encoder is set to 3.