Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
Authors: Victor Boutin, Rishav Mukherji, Aditya Agrawal, Sabine Muzellec, Thomas Fel, Thomas Serre, Rufin VanRullen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using Sim CLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples recognizability and originality) better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed. |
| Researcher Affiliation | Academia | 1Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, Toulouse, France. 2Centre de Recherche Cerveau & Cognition CNRS, Universite de Toulouse, France 3Carney Institute for Brain Science, Brown University |
| Pseudocode | Yes | Algorithm 1: VQVAE pseudo-code Algorithm 2: Prototype-based regularizer pseudo-code Algorithm 3: Sim CLR regularizer pseudo-code Algorithm 4: Barlow regularizer pseudo-code |
| Open Source Code | Yes | The code to train all described models is available at http://anonymous.4open.science/r/Latent Matters-526B. |
| Open Datasets | Yes | As done in previous work [31, 30, 11], we use the Omniglot [11] and the Quick Draw-FS [30] datasets to compare humans and machines on the one-shot drawing task...The databases we use are already in open access. |
| Dataset Splits | No | The paper specifies training and test splits for the datasets but does not explicitly mention a dedicated validation set split percentage or absolute counts for validation data. |
| Hardware Specification | Yes | All the experiments of this paper have been performed using Quadro-RTX600 GPUs with 16 GB memory. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer [69]' but does not provide specific version numbers for software dependencies or libraries such as Python, PyTorch, TensorFlow, or other key components required for replication. |
| Experiment Setup | Yes | We train the model using the Mean Squared Error loss with a batch size of 128 for the reconstruction, along with different regularizations to study its effects. For both datasets, we use the Adam optimizer [69] with a weight decay of 10^-5 and a learning rate of 10^-4. The RAEs on the Quick Draw dataset were trained for 200 epochs and 300 epochs on the Omniglot dataset. Note that when trained on the Omniglot dataset, we use a learning rate scheduler in which the learning rate is divided by 4 every 70 epoch. |