Contrastive Attention Networks for Attribution of Early Modern Print

Authors: Nikolai Vogler, Kartik Goyal, Kishore PV Reddy, Elizaveta Pertseva, Samuel V. Lemley, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method successfully improves downstream damaged type-imprint matching among printed works from this period, as validated by in-domain human experts. The results of our approach on two important philosophical works from the Early Modern period demonstrate potential to extend the extant historical research about the origins and content of these books. We evaluate our approach against other common methods for image comparison on a downstream damaged type-imprint matching dataset of English early modern (c. 1500 1800) books
Researcher Affiliation Academia 1 University of California, San Diego 2 Toyota Technological Institute at Chicago 3 Carnegie Mellon University
Pseudocode No The paper includes architectural diagrams (Figure 1) and a depiction of the data generation process (Figure 4), but it does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm' with structured, code-like steps.
Open Source Code Yes Code located at https://github.com/nvog/damaged-type.
Open Datasets Yes We obtain page image scans from 38 different English books printed from the 1650s 1690s by both known and unknown printers of historical interest. We use two different hand-curated datasets from recent bibliographical studies that manually identified and matched damaged type-imprints for attribution of two major early modern printed works (Warren et al. 2020, 2021).
Dataset Splits Yes Areopagitica validation set We collect a small validation set of the manually identified type-imprint matches used in the study for printer attribution of John Milton s anonymously printed Areopagitica (Warren et al. 2020). We train each model for 60 epochs and early stop using the best Areopagitica validation set recall.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory configurations. It mentions computational models and neural networks but no underlying hardware.
Software Dependencies No The paper mentions using 'Ocular OCR system (Berg-Kirkpatrick, Durrett, and Klein 2013a)' and 'scikit-image morphology (van der Walt et al. 2014)' but does not provide specific version numbers for these software dependencies or any other key libraries or programming languages.
Experiment Setup Yes We train CAML with the popular triplet loss (Weinberger and Saul 2009), which operates on an anchor/query embedding e along with the embedding e+ of a candidate image that matches the anchor and a non-matching candidate image s embedding e . This results in the following loss: max e e 2 e e+ 2+m, 0 , which focuses on minimizing the Euclidean distance between the anchor and the positive matching images embeddings, and maximizing the distance between the anchor and the non-matching images embeddings, such that the positive and negative examples are separated by a margin of at least m. We train each model for 60 epochs and early stop using the best Areopagitica validation set recall. We sample negative examples uniformly at random from our batch.