Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sketch2Diagram: Generating Vector Diagrams from Hand-Drawn Sketches

Authors: Itsumi Saito, Haruto Yoshida, Keisuke Sakaguchi

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluations reveal the limitations of state-of-the-art vision and language models (VLMs), positioning SKETIk Z as a key benchmark for future research in sketch-to-diagram conversion. Along with SKETIk Z, we present IMGTIk Z, an image-to-Tik Z model that integrates a 6.7B parameter code-specialized open-source large language model (LLM) with a pretrained vision encoder. Despite its relatively compact size, IMGTIk Z performs comparably to GPT-4o. This success is driven by using our two data augmentation techniques and a multi-candidate inference strategy. Our findings open promising directions for future research in sketch-to-diagram conversion and broader image-to-code generation tasks. SKETIk Z is publicly available.1
Researcher Affiliation Academia Itsumi Saito*, , Haruto Yoshida*, Keisuke Sakaguchi*, *Tohoku University, RIKEN AIP EMAIL
Pseudocode No The paper provides Python code listings (Listing 1 and Listing 2) for image augmentation pipelines, but these are actual code snippets, not abstract pseudocode or algorithm blocks.
Open Source Code No The paper states: "SKETIk Z is publicly available.1" with a footnote to https://sketikz.github.io/ for the dataset. However, there is no explicit statement or link providing access to the source code for the IMGTIk Z model or the methodology described in the paper.
Open Datasets Yes To address this gap, we introduce SKETIk Z, a new dataset designed for benchmarking sketch-to-diagram generation. SKETIk Z comprises 3,231 pairs of hand-drawn sketches and their corresponding Tik Z codes. ... SKETIk Z is publicly available.1 (Footnote 1: https://sketikz.github.io/). Datasets used in stage 2 training ... We also used existing pairs of Tik Z code and images (No. 8), excluding data with ar Xiv IDs that overlap with our collected dataset.
Dataset Splits Yes We aligned sketches Is with corresponding Tik Z codes Yr and reference images Ir, creating a dataset of 2,585 training, 323 validation, and 323 test samples.
Hardware Specification Yes We used 8 A100 GPUs for training IMGTIk Z, and 1 H100 GPU for inference. ... The training was conducted using four H100 80G GPUs. (for D-Sig LIP) ... We trained the model using a NVIDIA A100 GPU. (for diagram image classification model)
Software Dependencies Yes We used pdflatex from Te X Live 2023 to compile generated Tik Z code into a diagram image. ... We used the gpt-4o-2024-05-13 version for GPT-4o, the gpt-4o-mini-2024-07-18 version for GPT-4o mini, the claude-3-5-sonnet-20240620 version for Claude 3.5, and the llama3-llava-next-8b version, which is trained on the 8B Llama 3 model, for LLa VA-Next. ... We used text-embedding-3-small version. (for OpenAI's text embedding model) ... We used the google/siglip-so400m-patch14-384 version of Sig LIP as the vision encoder.
Experiment Setup Yes We set the Lo RA tuning parameters for training to r=128 and α=256. Stage 1 training was conducted with a batch size of 256 for 6,000 steps. Stage 2 training used a batch size of 128 for 1 epoch. ... The maximum number of attempts M for iterative sampling was set to 5, and the number of candidates K for multi-candidate generation was set to 20. ... The sampling temperature was set to 0.6. ... Table 8: Configuration for the IMGTIk Z model training. Option Value model max length 4096 num train epochs 1 batch size 16 gradient accumulation steps 8 mm projector lr 2e-5.