Modelling complex vector drawings with stroke-clouds
Authors: Alexander Ashcroft, Ayan Das, Yulia Gryaditskaya, Zhiyu Qu, Yi-Zhe Song
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS & RESULTS |
| Researcher Affiliation | Academia | 1Sketch X, CVSSP, University of Surrey, UK 2Surrey Institute for People-Centred AI (PAI), UK |
| Pseudocode | No | The paper includes network diagrams and mathematical formulations but does not contain explicitly structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and the data are available at https://github.com/Co-do/Stroke-Cloud. |
| Open Datasets | Yes | To demonstrate the effectiveness of our stroke cloud-based sketch generation framework in generating complex vector sketches, we synthetically generate a new dataset that we name Anime-Vec10k, derived from the Danbooru2019 dataset (Branwen et al., 2019) of anime raster images. |
| Dataset Splits | No | The paper mentions using a training dataset (Anime-Vec10k) but does not provide explicit details about training/validation/test dataset splits, percentages, or absolute counts for validation. |
| Hardware Specification | Yes | The model was trained for 72 hours on a single RTX 4090 with a batch size of 128 and an initial learning rate of 1e-4 that was decayed to 5e-5. |
| Software Dependencies | No | The paper mentions the use of a Set Transformer and an MLP-based diffusion model, but it does not provide specific version numbers for software dependencies or libraries used for implementation (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The model was trained for 72 hours on a single RTX 4090 with a batch size of 128 and an initial learning rate of 1e-4 that was decayed to 5e-5. After the initial training period, we applied KL annealing for another 24 hours, increasing the KL scale factor from 0 to 1e-8. We trained our model with a linear noise schedule of βmin = 1e 4, βmax = 1e 5 and 200 time steps. LSG. Each LSG was trained on the latent data obtained by passing the training data through the trained encoder. The LSG was then trained for 12 hours with a batch size of 2048 and an initial learning rate of 1e-4 that was decayed to 5e-5. The LSG was trained on a non-conditional version of our MLP. We used a scaled-linear noise schedule of βmin = 2e 2, βmax = 1e 4 and 4000 time steps. |