DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
Authors: Jonas Belouadi, Simone Ponzetto, Steffen Eger
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through both automatic and human evaluation, we demonstrate that De Tik Zify outperforms commercial Claude 3 and GPT-4V in synthesizing Tik Z programs, with the MCTS algorithm effectively boosting its performance. |
| Researcher Affiliation | Academia | Natural Language Learning Group, , Data and Web Science Group University of Mannheim, , University of Technology Nuremberg |
| Pseudocode | No | The paper describes the steps of the MCTS algorithm in Section 5.1 and illustrates them with Figure 2, but it does not provide a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We make our code, models, and datasets publicly available.1 |
| Open Datasets | Yes | We make our code, models, and datasets publicly available.1 |
| Dataset Splits | No | Before training on Da Tik Zv2, we extract 1k samples to serve as our test set for an automatic evaluation and generate corresponding synthetic sketches. [...] Next, we unfreeze the language model (keeping the vision encoder frozen) and fine-tune on examples from Da Tik Zv2 that fit within a 2048 token context window. We use a batch size of 128, a learning rate of 4e 5, and train for three epochs. The paper describes a test set and training procedure but does not explicitly mention a separate validation set or split. |
| Hardware Specification | Yes | For training and inference of our local De Tik Zify models, we utilize a compute node equipped with four Nvidia A40 GPUs and 448 gigabytes of RAM. |
| Software Dependencies | Yes | The key difference is that Da Tik Zv2 includes all Tik Z programs that compile with TEX Live 2023, regardless of whether they have associated captions, which was a requirement for inclusion in Da Tik Zv1 but is not needed for De Tik Zify. |
| Experiment Setup | Yes | We pretrain for one epoch on Meta Fig with Adam W (Loshchilov and Hutter, 2019), a batch size of 256, a learning rate of 1e 3, and a cosine learning rate decay with a 3% warmup ratio. Next, we unfreeze the language model (keeping the vision encoder frozen) and fine-tune on examples from Da Tik Zv2 that fit within a 2048 token context window. We use a batch size of 128, a learning rate of 4e 5, and train for three epochs. Across all models, we set the temperature to 0.8 and the exploration coefficient c to 0.6. |