Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables

Authors: Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, Takeshi Okadome281-288

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrated the superiority of the proposed approach over state-of-the-art approaches for semantic annotation of real data (183 human-annotated tables obtained from the UCI Machine Learning Repository).
Researcher Affiliation Collaboration Kunihiro Takeoka NEC Corporation EMAIL Masafumi Oyamada NEC Corporation EMAIL Shinji Nakadai NEC Corporation EMAIL Takeshi Okadome Kwansei Gakuin University EMAIL
Pseudocode Yes Algorithm 1 Approximate prediction with Gibbs sampling
Open Source Code No The paper does not provide any explicit statement about releasing open-source code or a link to a code repository.
Open Datasets Yes The dataset we used consists of 183 human-annotated tables (with 781 NE-columns and 4,109 literal-columns) obtained from the UCI Machine Learning repository (Dua and Karra Taniskidou 2017).
Dataset Splits No The paper mentions using a 'training dataset' for optimizing parameters and for evaluation, but it does not specify any particular data splits (e.g., 80/10/10) for training, validation, and testing. It refers to 'human-annotated 183 tables'.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU models, GPU types, memory).
Software Dependencies No The paper mentions software components like 'Poincarรฉ embedding' and 'random forest classifiers' but does not specify their version numbers or other software dependencies with versions required for replication.
Experiment Setup Yes We set the number of iterations in Gibbs sampling to 300 because we observed the convergence at that point and further iterations did not affect the accuracy of the model.