Early Discovery of Emerging Entities in Microblogs

Authors: Satoshi Akasaki, Naoki Yoshinaga, Masashi Toyoda

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results with a large-scale Twitter archive show that the proposed method achieves 83.2% precision of the top 500 discovered emerging entities, which outperforms baselines based on unseen entity recognition with burst detection.
Researcher Affiliation Academia 1The University of Tokyo 2Institute of Industrial Science, the University of Tokyo
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks, nor does it have clearly labeled algorithm sections or code-like formatted procedures.
Open Source Code No We will release all the datasets (tweet IDs)1 used in experiments to promote the reproducibility. (Footnote 1: http://www.tkl.iis.u-tokyo.ac.jp/ akasaki/ijcai-19/) - The paper explicitly states releasing datasets, not the source code for the methodology.
Open Datasets Yes We collected titles of articles that were registered in the Japanese version of Wikipedia from March 11th, 2012 to December 31st, 2015 using the Wikipedia dump on June 20th, 2018. and We will release all the datasets (tweet IDs)1 used in experiments to promote the reproducibility. (Footnote 1: http://www.tkl.iis.u-tokyo.ac.jp/ akasaki/ijcai-19/)
Dataset Splits Yes For model selection, we used 10% of the training data as the development data.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies Yes We used the implementation using MALLET (ver. 2.0.6) [Mc Callum, 2002]... We used the implementation using Theano (ver. 0.9.0) provided by [Lample et al., 2016]... We tokenized each example by using Me Cab (ver. 0.996)3 with ipadic dictionary (ver. 2.7.0)... We used Cabo Cha (https://taku910.github.io/cabocha/).
Experiment Setup Yes We therefore empirically set the parameters to k = 5, n = 100 and k = 10. and The hyperparameter C was tuned to 0.125 using the development data. and We optimized the model using stochastic gradient descent and chose the model at the epoch with the highest F1 on the development data.