reproducibilityindex.ai

AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ

Authors: Jonas Belouadi, Anne Lauscher, Steffen Eger

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We fine-tune LLa MA on Da Tik Z, as well as our new model CLi MA, which augments LLa MA with multimodal CLIP embeddings. In both human and automatic evaluation, CLi MA and LLa MA outperform commercial GPT-4 and Claude 2 in terms of similarity to humancreated figures, with CLi MA additionally improving text-image alignment. Our detailed analysis shows that all models generalize well and are not susceptible to memorization.
Researcher Affiliation	Academia	Jonas Belouadi Natural Language Learning Group Bielefeld University, Germany jonas.belouadi@uni-bielefeld.de Anne Lauscher Data Science Group University of Hamburg, Germany anne.lauscher@uni-hamburg.de Steffen Eger Natural Language Learning Group University of Mannheim, Germany steffen.eger@uni-mannheim.de
Pseudocode	No	The paper describes methods such as iterative resampling but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We make our framework, Automa Tik Z, along with model weights and datasets, publicly available.1 https://github.com/potamides/Automa Tik Z
Open Datasets	Yes	As part of our Automa Tik Z project, we create Da Tik Z, the first large-scale Tik Z dataset to our knowledge, featuring approximately 120k paired Tik Z drawings and captions. We make our framework, Automa Tik Z, along with model weights and datasets, publicly available.1 https://github.com/potamides/Automa Tik Z
Dataset Splits	No	Before fine-tuning our models on Da Tik Z, we extract a sample of 1k human-created items to serve as our test set. The paper does not provide explicit training/validation/test dataset splits (e.g., percentages or counts for each) or reference a specific, predefined split.
Hardware Specification	No	The paper mentions 'constraints of our existing GPU resources' but does not provide specific hardware details such as GPU models, CPU models, or detailed cloud/cluster specifications used for running experiments.
Software Dependencies	No	The paper mentions software like LLa MA, CLIP, Moses tokenizer, and Adam W, citing their respective papers, but does not provide specific version numbers for these or other ancillary software components used in the experiments.
Experiment Setup	Yes	We train for 12 epochs with Adam W (Loshchilov & Hutter, 2019) and a batch size of 128, but increase the learning rate to 5e 4 as this leads to faster convergence. We introduce trainable low-rank adaption weights (Lo RA; Hu et al., 2022) while keeping the base model weights frozen and in half precision (Micikevicius et al., 2018). Following Dettmers et al. (2023), we apply Lo RA to all linear layers.