Spatial-Semantic Collaborative Cropping for User Generated Content

Authors: Yukun Su, Yiwen Cao, Jingliang Deng, Fengyun Rao, Qingyao Wu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the proposed UGCrop5K and other public datasets demonstrate the superiority of our approach over state-of-the-art counterparts.
Researcher Affiliation Collaboration Yukun Su1,2, Yiwen Cao1, Jingliang Deng1, Fengyun Rao2, Qingyao Wu1,3* 1 School of Software and Engineering, South China University of Technology 2 We Chat, Tencent Inc. 3 Key Laboratory of Big Data and Intelligent Robot, Ministry of Education
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The complete dataset and results will be released in Github.
Open Datasets Yes Specifically, we first collect parts of the user generated content from the opensource databases, including Ko NVi D-1k (Hosu et al. 2017), LIVE-VQC (Sinno and Bovik 2018a,b; Z. Sinno and A.C. Bovik 2018), You Tube UGC (Wang, Inguva, and Adsumilli 2019) and Bilibili (Ma et al. 2019) social video websites. To verify the generalization of our model, we also conduct experiments on other public image cropping benchmarks: GAICv1 (Zeng et al. 2019) and GAICv2 (Zeng et al. 2020) datasets.
Dataset Splits No In general, we have 5,000 images with 450,000 high-quality annotated candidate crops in the dataset, and we split 4,200 images for training and 800 images for testing. This does not specify a validation split.
Hardware Specification Yes all the experiments are executed on a single NVIDIA RTX 2080Ti GPU.
Software Dependencies No The paper mentions using MobileNet V2 and Adam W, but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes The short side of the input sample is resized to 256 and maintains the aspect ratio. The aligned size of Ro IAlign is set to 15 15 and the proposal number is set to 10 empirically. We stack 2 layers of the adaptive attention graphs with the multi-head number of 4. The network is optimized by Adam W with the learning rate of 1e-4 for 80 epochs. Data augmentations are similar to prior works (Zeng et al. 2020; Li et al. 2020), including random flipping, saturation, and lighting noise are adopted.