Learning Subject-Aware Cropping by Outpainting Professional Photos

Authors: James Hong, Lu Yuan, Michaƫl Gharbi, Matthew Fisher, Kayvon Fatahalian

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that Gen Crop yields competitive results against fully supervised approaches (Zhang et al. 2022) on the existing datasets (Fang et al. 2014; Chen et al. 2017a; Yang et al. 2023) (under quantitative metrics such as Intersection-over-Union and boundary displacement (Zhang et al. 2022)), while being superior to the best weakly/unsupervised method (Chen et al. 2017b). We also evaluate Gen Crop on additional subject categories such as cats, dogs, etc. to test the generalization of our approach beyond just humans. On qualitative evaluation, Gen Crop is comparable to or better than supervised prior work on the rate of cropping errors, while prior weakly-supervised/unsupervised baselines fall substantially short. Lastly, we conduct additional analysis and ablations to assess the effectiveness and limitations of learning to crop from our generated data.
Researcher Affiliation Collaboration James Hong1, Lu Yuan1, Micha el Gharbi2, Matthew Fisher2, Kayvon Fatahalian1 1Stanford University 2Adobe Research {james.hong, luyuan, kayvonf}@cs.stanford.edu, {mgharbi, matfishe}@adobe.com
Pseudocode No No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code Yes Our code is publicly available.
Open Datasets Yes Unsplash contains over three million curated images and is publicly accessible for research use (Unsplash 2023). Prior cropping datasets such as FLMS (Fang et al. 2014), FCDB (Chen et al. 2017a), and SACD (Yang et al. 2023) lack the quantity of images in any particular subject category needed to serve as evaluation... The annotations for this data and the images are all publicly available.
Dataset Splits Yes Table 1: Dataset statistics and splits for each subject class. # outpainted is the number of synthetic training images that pass the automatic quality filters in 3.1. # labeled is the size of our hand-labeled evaluation subset (of the test split).
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software like Stable Diffusion (SD) (Rombach et al. 2022), an image captioner (Li et al. 2023), and an instance segmenter (Ultralytics 2023) (YOLOv8). However, it does not provide specific version numbers for these or other ancillary software components.
Experiment Setup No See supplemental C.2 for hyper-parameters and details.