ImaginaryNet: Learning Object Detectors without Real Images and Annotations

Authors: Minheng Ni, Zitong Huang, Kailai Feng, Wangmeng Zuo

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that IMAGINARYNET can (i) obtain about 75% performance in ISOD compared with the weakly supervised counterpart of the same backbone trained on real data, (ii) significantly improve the baseline while achieving state-of-the-art or comparable performance by incorporating IMAGINARYNET with other supervision settings. Our code will be publicly available at https://github.com/kodenii/Imaginary Net.
Researcher Affiliation Academia Minheng Ni, Zitong Huang, Kailai Feng & Wangmeng Zuo Faculty of Computing Harbin Institute of Technology {mhni, zthuang, klfeng}@stu.hit.edu.cn wmzuo@hit.edu.cn
Pseudocode No The paper describes the methodology in prose and block diagrams (Figure 1, Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code will be publicly available at https://github.com/kodenii/Imaginary Net.
Open Datasets Yes We first compare IMAGINARYNET with the ISOD model to verify whether it is feasible to learn object detectors without real images and manual annotations... Unless otherwise specified, we generate 5,000 imaginary images during training, which has the similar amount of images in comparison to PASCAL VOC2007 trainval set.
Dataset Splits Yes Unless otherwise specified, we generate 5,000 imaginary images during training, which has the similar amount of images in comparison to PASCAL VOC2007 trainval set. ... We selected classes that the performance is higher than baseline model. Then we re-trained the model with imaginary samples of selected classes to obtain the best performance. We explained this as the gap among Imaginary samples and real samples in some classes, such as boat or train.
Hardware Specification No The paper does not explicitly specify the hardware used for training or experiments (e.g., specific GPU models, CPU types, or cloud computing instances).
Software Dependencies No We use GPT-2 (Radford et al., 2019) as the language model and DALLE-mini... We implement the image encoder with ResNet50 pretrained on ImageNet dataset. ... While software components are mentioned, specific version numbers for libraries, frameworks (like PyTorch/TensorFlow), or the Python environment are not provided.
Experiment Setup Yes We use GPT-2 (Radford et al., 2019) as the language model and DALLE-mini, which can better follow the language guidance, as the text-to-image synthesis model. We implement the image encoder with ResNet50 pretrained on ImageNet dataset. For Proposal Generator, we use Selective Search following W2N (Huang et al., 2022) for obtaining no real images and manual annotations. All training hyper-parameters follow OICR (Tang et al., 2017) and W2N (Huang et al., 2022) for a fair comparison. Unless otherwise specified, we generate 5,000 imaginary images during training, which has the similar amount of images in comparison to PASCAL VOC2007 trainval set.