DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

Authors: Jonas Belouadi, Simone Ponzetto, Steffen Eger

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through both automatic and human evaluation, we demonstrate that De Tik Zify outperforms commercial Claude 3 and GPT-4V in synthesizing Tik Z programs, with the MCTS algorithm effectively boosting its performance.
Researcher Affiliation Academia Natural Language Learning Group, , Data and Web Science Group University of Mannheim, , University of Technology Nuremberg
Pseudocode No The paper describes the steps of the MCTS algorithm in Section 5.1 and illustrates them with Figure 2, but it does not provide a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes We make our code, models, and datasets publicly available.1
Open Datasets Yes We make our code, models, and datasets publicly available.1
Dataset Splits No Before training on Da Tik Zv2, we extract 1k samples to serve as our test set for an automatic evaluation and generate corresponding synthetic sketches. [...] Next, we unfreeze the language model (keeping the vision encoder frozen) and fine-tune on examples from Da Tik Zv2 that fit within a 2048 token context window. We use a batch size of 128, a learning rate of 4e 5, and train for three epochs. The paper describes a test set and training procedure but does not explicitly mention a separate validation set or split.
Hardware Specification Yes For training and inference of our local De Tik Zify models, we utilize a compute node equipped with four Nvidia A40 GPUs and 448 gigabytes of RAM.
Software Dependencies Yes The key difference is that Da Tik Zv2 includes all Tik Z programs that compile with TEX Live 2023, regardless of whether they have associated captions, which was a requirement for inclusion in Da Tik Zv1 but is not needed for De Tik Zify.
Experiment Setup Yes We pretrain for one epoch on Meta Fig with Adam W (Loshchilov and Hutter, 2019), a batch size of 256, a learning rate of 1e 3, and a cosine learning rate decay with a 3% warmup ratio. Next, we unfreeze the language model (keeping the vision encoder frozen) and fine-tune on examples from Da Tik Zv2 that fit within a 2048 token context window. We use a batch size of 128, a learning rate of 4e 5, and train for three epochs. Across all models, we set the temperature to 0.8 and the exploration coefficient c to 0.6.