Knowledge-Enhanced Historical Document Segmentation and Recognition

Authors: En-Hao Gao, Yu-Xuan Huang, Wen-Chao Hu, Xin-Hao Zhu, Wang-Zhou Dai

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To show the effectiveness of KESAR, we conduct extensive experiments on three datasets. The experimental results demonstrate that our method can simultaneously utilize knowledge-driven reasoning and data-driven learning, which outperforms the current state-of-the-art methods.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3School of Intelligence Science and Technology, Nanjing University, China
Pseudocode Yes Algorithm 1: Abductive Matching; Algorithm 2: Over-Segmentation and Recombination (OSR)
Open Source Code Yes The code is available for download1. 1https://github.com/AbductiveLearning/ABL-HD
Open Datasets Yes TKH (Yang et al. 2018) is a collection of historical documents released by HCIILAB, containing 1,000 images sourced from the Tripitaka Koreana. MTH (Ma et al. 2020) is a more challenging historical document dataset. GBACHD is the most challenging dataset in our experiments, released in the 2022 Greater Bay Area (Huangpu) International Algorithm Case Competition.
Dataset Splits No The paper specifies training and testing splits for MTH and GBACHD datasets (e.g., "randomly partitioned into training and testing subsets at a 7:3 ratio"), and pre-training data for TKH, but does not explicitly mention a separate validation split or subset with specific details.
Hardware Specification Yes All experiments are conducted on a server with 8 Nvidia V100 GPUs.
Software Dependencies No The paper mentions models like ResNet50, ResNet34, CRAFT, PSENet, FCENet, Robust Scanner, ABINet, and the mmocr codebase. However, it does not specify version numbers for these software components or any underlying libraries (e.g., Python, PyTorch, TensorFlow versions) which are necessary for reproducible software dependencies.
Experiment Setup Yes We first employ the training data of TKH to pre-train the segmentation model for 320 epochs and then utilize MTH and GBACHD to fine-tune the model for 180 and 80 epochs, respectively. Our recognition model is Res Net34. We first employ the training data of TKH to pre-train the network for 25 epochs and then utilize character images generated by the segmentation model to fine-tune the model for another 25 epochs. The Joint Optimization stage requires only 10 epochs.