CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders

Authors: Kevin Frans, Lisa Soros, Olaf Witkowski

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results compare CLIPDraw with other synthesisthrough-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse styles, and scaling from simple to complex visual representations as stroke count increases.
Researcher Affiliation Collaboration Kevin Frans Massachusetts Institute of Technology, Cambridge, MA, USA Cross Labs, Cross Compass Ltd., Tokyo, Japan L. B. Soros Cross Labs, Cross Compass Ltd., Tokyo, Japan Olaf Witkowski Cross Labs, Cross Compass Ltd., Tokyo, Japan Earth-Life Science Institute, Tokyo Institute of Technology, Japan College of Arts and Sciences, University of Tokyo, Japan
Pseudocode Yes Algorithm 1 CLIPDraw Input: Description Phrase desc; Iteration Count I; Curve Count N; Augment Size D; Pre-trained CLIP model. Begin: Encode Description Phrase. Enc Phr = CLIP(desc) Initialize Curves. Curves..N = Random Curve() for i = 0 to I do Render Curves to Pixels. Pixels = Diff Render(Curves) Augment the Image. Aug Batch..D = Augment(Pixels) Encode Image. Enc Img = CLIP(Aug Batch) Compute Loss. Loss = Cosine Sim(Enc Phr, Enc Img) Backprop. Curves Minimize(Loss) end for
Open Source Code Yes To this end, source code is available at: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb.
Open Datasets No The paper states CLIPDraw 'does not require any additional training' and uses a 'pre-trained CLIP model'. While it mentions datasets used for other methods in related work, there is no dataset for training CLIPDraw itself or for direct experimental evaluation in a dataset benchmark sense.
Dataset Splits No The paper does not mention validation dataset splits, as CLIPDraw is a training-free optimization method.
Hardware Specification No The paper mentions 'within a minute on a typical GPU' and 'on a typical Colab GPU', which are too vague and do not provide specific hardware models.
Software Dependencies No The paper mentions 'torch.transforms.Random Perspective and torch.transforms.Random Resized Crop functions' but does not specify the version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes These methods are run on the same CLIP matching objective for 250 steps of gradient descent (Figure 15). In CLIPDraw, stroke count is 256, and 8 duplicates are used during image augmentation.