Object Segmentation Without Labels with Large-Scale Generative Models

Authors: Andrey Voynov, Stanislav Morozov, Artem Babenko

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By extensive comparison on standard benchmarks, we outperform existing unsupervised alternatives for object segmentation, achieving new state-of-the-art. Our model and implementation are available online2. In extensive experiments, we show that the approach often outperforms the existing unsupervised alternatives for object segmentation and saliency detection.
Researcher Affiliation Industry Andrey Voynov 1 Stanislav Morozov 1 Artem Babenko 1 1Yandex, Moscow, Russia.
Pseudocode No The paper describes its methods through text, equations, and diagrams, but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Our model and implementation are available online2. 2https://github.com/anvoynov/ Big GANs Are Watching
Open Datasets Yes Caltech-UCSD Birds 200-2011 (Wah et al., 2011) contains 11,788 photographs of birds with segmentation masks. Flowers (Nilsback & Zisserman, 2007) contains 8,189 images of flowers equipped with saliency masks generated automatically via the method developed for flowers. ECSSD (Shi et al., 2015) contains 1,000 images with structurally complex natural contents. DUTS (Wang et al., 2017a) contains 10,553 train and 5,019 test images. DUT-OMRON (Yang et al., 2013) contains 5,168 images of high content variety.
Dataset Splits No We follow (Chen et al., 2019), and use 10,000 images for our training subset and 1,000 for the test subset from splits provided by (Chen et al., 2019). Unlike (Chen et al., 2019), we do not use any images for validation and simply omit the remaining 788 images. (For CUB-200-2011). DUTS contains 10,553 train and 5,019 test images (no validation mentioned).
Hardware Specification Yes Overall, this optimization takes a few minutes on the Nvidia-1080ti GPU card. Training with online synthetic data generation takes approximately seven hours on two Nvidia 1080Ti cards.
Software Dependencies No The paper mentions software components like U-net and Adam optimizer, but does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes In all our experiments, we employ a standard U-net architecture (Ronneberger et al., 2015). We train U-net on the synthetic dataset with the Adam optimizer and the binary cross-entropy objective applied on the pixel level. We perform 12 × 10^3 steps with batch 95. The initial learning rate equals 0.001 and is decreased by 0.2 on step 8 × 10^3.