Learning from Pattern Completion: Self-supervised Controllable Generation
Authors: Zhiqiang Chen, Guofan Fan, Jinying Gao, Lei Ma, Bo Lei, Tiejun Huang, Shan Yu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed modular autoencoder effectively achieves functional specialization, including the modular processing of color, brightness, and edge detection, and exhibits brain-like features including orientation selectivity, color antagonism, and center-surround receptive fields. Through self-supervised training, associative generation capabilities spontaneously emerge in SCG, demonstrating excellent zero-shot generalization ability to various tasks such as superresolution, dehaze and associative or conditional generation on painting, sketches, and ancient graffiti. Compared to the previous representative method Control Net, our proposed approach not only demonstrates superior robustness in more challenging high-noise scenarios but also possesses more promising scalability potential due to its self-supervised manner. |
| Researcher Affiliation | Academia | Zhiqiang Chen1,2* , Guofan Fan3* , Jinying Gao2,1,4*, Lei Ma5, Bo Lei1, Tiejun Huang1,5, and Shan Yu2,4 1Beijing Academy of Artificial Intelligence 2Institute of Automation, Chinese Academy of Science 3Xi an Jiaotong University 4University of Chinese Academy of Science 5Peking University |
| Pseudocode | No | The paper describes the model architecture and equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are released on Github and Gitee. |
| Open Datasets | Yes | We trained our modular autoencoder mainly on two typical datasets. One is MNIST, which is a small dataset with gray digits. The other is Image Net, which is a relatively large dataset with color natural images. ... MS-COCO (Common Objects in Context) is a large-scale dataset for object detection, segmentation, captioning, and other computer vision tasks. It contains over 200,000 images with more than 1.5 million labeled objects. We use coco2017 train set including of 118K images to train our SCG. |
| Dataset Splits | Yes | A quantitative analysis is performed on the validation set of MS-COCO in Table 1. |
| Hardware Specification | Yes | We trained our model for 5 epochs with a batch size of 4 on the COCO dataset using an NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions using "SD1.5 as the pretrained diffusion model" and "Adam W optimizer", but it does not specify version numbers for general software dependencies like Python, PyTorch, or other libraries used in the implementation. |
| Experiment Setup | Yes | We trained our model for 5 epochs with a batch size of 4 on the COCO dataset using an NVIDIA A100 GPU. ... We train it for 40000 steps with Adam W optimizer on a cosine anneal strategy with a start learning rate of 0.005. |