DeS3: Adaptive Attention-Driven Self and Soft Shadow Removal Using ViT Similarity

Authors: Yeying Jin, Wei Ye, Wenhan Yang, Yuan Yuan, Robby T. Tan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method outperforms state-of-the-art methods on the SRD, AISTD, LRSS, USR and UIUC datasets, removing hard, soft, and self shadows robustly. Specifically, our method outperforms the SOTA method by 16% of the RMSE of the whole image on the LRSS dataset. Comprehensive experiments on the SRD, AISTD, LRSS, UIUC and USR datasets demonstrate that De S3 outperforms the state-of-the-art methods, particularly on self and soft shadows. Experiments Implementation To ensure fair comparisons, all the baselines, including ours are trained and tested on the same datasets. Ablation Studies Fig. 12 and Table. 4 show the effectiveness of the Vi T similarity loss used in De S3.
Researcher Affiliation Collaboration Yeying Jin1, Wei Ye2, Wenhan Yang3, Yuan Yuan2, Robby T. Tan 1 1National University of Singapore 2Huawei International Pte Ltd 3Peng Cheng Laboratory e0178303@u.nus.edu, yewei10@huawei.com, yangwh@pcl.ac.cn, yuanyuan10@huawei.com, robby.tan@nus.edu.sg
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our data and code is available at: https://github.com/jinyeying/De S3 Deshadow.
Open Datasets Yes Our method outperforms state-of-the-art methods on the SRD, AISTD, LRSS, USR and UIUC datasets. We trained our De S3 on each dataset and tested on the corresponding dataset, e.g., for SRD (Qu et al. 2017), we used 2680 SRD and 408 SRD images for training and testing, respectively. The LRSS (Gryka, Terry, and Brostow 2015) is a soft shadow dataset with 134 shadow images2; we followed (Jin, Sharma, and Tan 2021; Gryka, Terry, and Brostow 2015), using the same 34 LRSS images with their corresponding shadow-free images for evaluation shown in Table 3. 2LRSS Dataset are obtained from their project website: http://visual.cs.ucl.ac.uk/pubs/softshadows/ UCF (Zhu et al. 2010) and UIUC dataset (Guo, Dai, and Hoiem 2011), contain 245 and 108 images.
Dataset Splits No The paper mentions using 2680 SRD images for training and 408 SRD images for testing. For LRSS, it specifies using 34 images for evaluation. However, it does not explicitly mention a separate validation set or cross-validation methodology.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions using 'Denoising Diffusion Implicit Models (DDIM) (Song, Meng, and Ermon 2021)' and 'DINO-Vi T (Caron et al. 2021)' but does not specify any software libraries, frameworks, or their version numbers (e.g., PyTorch 1.9, TensorFlow 2.x, Python 3.x).
Experiment Setup Yes We use 1000 steps for training and 25 steps for inference, noise schedule βt linear from 0.0001 to 0.02. α and β are empirically set to 0.5 in training.