reproducibilityindex.ai

CEMA – Cost-Efficient Machine-Assisted Document Annotations

Authors: Guowen Yuan, Ben Kao, Tien-Hsuan Wu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on complex annotation tasks in which we compare CEMA against other document selection and annotation strategies. Our results show that CEMA is the most cost-efficient solution for those tasks.
Researcher Affiliation	Academia	Guowen Yuan, Ben Kao, Tien-Hsuan Wu The University of Hong Kong {gwyuan, kao, thwu}@cs.hku.hk
Pseudocode	No	The paper does not include any blocks or sections explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	1The code of CEMA can be found in https://github.com/ gavingwyuan/cema
Open Datasets	Yes	The Drug Trafficking Judgments (DTJ) dataset (Wu et al. 2020) ... The German Legal (GL) dataset (Leitner, Rehm, and Schneider 2020)
Dataset Splits	Yes	Experiments are conducted using 5-fold cross validation in which 80% of the documents are used as the set of unlabeled documents (for which the MAA process is applied), and 20% of the documents (with their ground truth markups) are used to evaluate the accuracy of the resulting machine annotator (after the MAA process terminates).
Hardware Specification	No	The paper mentions adopting pre-trained BERT models but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'pre-trained English and German BERT models' and 'Adam W (Loshchilov and Hutter 2019) optimizer' but does not specify version numbers for any libraries, frameworks, or programming languages used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	We use 2 human-annotated documents as seed. For this initialization, we train the machine annotator and the action predictor for 50 epochs. In each MAA iteration, the document selection module selects k = 2 documents to be verified by human workers. We use 5 epochs in re-training. The batch size for training the MA and the action predictor are 8 and 3, respectively. We set ρ = 0.8 and w = 0.5. In all trainings, we use Adam W (Loshchilov and Hutter 2019) optimizer and set the learning rate to 2 10 5. Based on observing real human annotation tasks, we set the time costs (in seconds) of verifier actions to tread = 0.3s; tconfirm = 1s; trelabel = 4s; tdelete = 1.5s; tadd = 10s as our default setting.