General Detection-based Text Line Recognition
Authors: Raphael Baena, Syrine Kalleli, Mathieu Aubry
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten (HTR), with Latin, Chinese, or ciphered characters. ... Remarkably, we demonstrate good performance on a large range of scripts, usually tackled with specialized approaches. In particular, we improve state-of-the-art performances for Chinese script recognition on the CASIA v2 dataset, and for cipher recognition on the Borg and Copiale datasets. Our code and models are available at https://github.com/raphael-baena/DTLR. |
| Researcher Affiliation | Academia | Raphael Baena, Syrine Kalleli, Mathieu Aubry LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS Marne-la-Vallée, France firstname.lastname@enpc.fr |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our code and models are available at https://github.com/raphael-baena/DTLR. ... Our code and models are available at: https://github.com/raphael-baena/DTLR. |
| Open Datasets | Yes | We performed OCR on the Google1000 dataset [22]... We evaluate our approach on text line HTR in various languages with latin script: IAM (English) [29], RIMES [39] (French), and READ [44] (Old German). ... we also evaluate our method on CASIA v2 [27] a benchmark for handwritten Chinese text-line recognition... We evaluate our approach on the Borg and Copiale cipher [5]... |
| Dataset Splits | Yes | For IAM, ... It includes 6,161 training lines, 966 validation lines, and 2,915 testing lines. The READ 2016 dataset ... It includes 8,367 training lines, 1,043 validation lines, and 1,140 test lines. The RIMES dataset ... The training set has 10,188 lines, which we split into a training set (80%) and a validation set (20%). The test set includes 778 lines. |
| Hardware Specification | Yes | We compare our inference speed to Tr OCr [25] and Faster DAN [14] for which public code is available, on text lines from the RIMES dataset [44] with a batch size of 1 and using an A6000 GPU. |
| Software Dependencies | No | The paper mentions using the 'ADAM optimizer' and 'Ken LM [23] library' but does not specify their version numbers. It also mentions 'Py Laia 2' but without a version. |
| Experiment Setup | Yes | We follow Zhang et al. [57], and uses Ne = 6 encoder layers, Nd = 6 decoder layers, Q = 900 queries, and as hyperparameters λcls = 2, λbox = 5, λ cls = 1 and λ box = 5. We generate synthetic datasets of 100k text lines, and train the networks for 225k iterations with batch size of 4, using the ADAM optimizer with β1 = 0.9, β2 = 0.999, a fixed learning rate of 10 4, and a weight decay of 10 4. ... We fine-tune our networks with the same parameters as for pre-training, except the learning rate for which we use 10 5 for 1200k iterations and then 10 6 for 800k iterations. |