ADOPD: A Large-Scale Document Page Decomposition Dataset
Authors: Jiuxiang Gu, Xiangxi Shi, Jason Kuen, Lu Qi, Ruiyi Zhang, Anqi Liu, Ani Nenkova, Tong Sun
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experimental analyses to validate our data and assess the four tasks using various models. |
| Researcher Affiliation | Collaboration | 1Adobe Research 2Oregon State University 3UC, Merced 4Johns Hopkins University |
| Pseudocode | Yes | Alg. 1 outlines the process where we integrate outlier detection for data collection and taxonomy discovery. |
| Open Source Code | No | The paper provides a link to a project page (https://adopd2024.github.io) but not a direct link to a source code repository or an explicit statement about the release of source code for the methodology. |
| Open Datasets | Yes | The images in ADOPD are sourced from the Laion-HR (Laion High Resolution), which comprises high-resolution web images, including multilingual document images. Laion High Resolution. Laion. https://huggingface.co/datasets/laion/ laion-high-resolution. 2023. |
| Dataset Splits | Yes | We experiment on the subset of ADOPD, with training and validation sets comprising 50k and 10k images, respectively. |
| Hardware Specification | Yes | All experiments are run on NVIDIA A100-80GB GPUs. |
| Software Dependencies | No | The paper mentions software like Detectron2, MMDetection, and Huggingface Transformers, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Following standard practices(Ghiasi et al., 2021), we employ an input resolution of 1024 × 1024, achieved by re-scaling and padding the shorter side of the image. Doc2Mask (Crop Former and Mask2Former) and Doc2Box (Faster R-CNN, Cascade Mask-RCNN) are trained for 15 epochs with a batch size of 32 on 8 GPUs to achieve full convergence. We train Deformable-DETR for 30 epochs due to slow convergence issues. For Doc2Seq, we train it for 50 epochs on 8 GPUs with a total batch size of 800. Finetuning CLIP Vi T-G/14 on Doc2Seq data takes 100 epochs on 8x8 GPUs. |