ZOOM: Learning Video Mirror Detection with Extremely-Weak Supervision
Authors: Ke Xu, Tsun Wai Siu, Rynson W.H. Lau
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results under new and standard metrics show that ZOOM performs favorably against existing fully-supervised mirror detection methods. |
| Researcher Affiliation | Academia | Ke Xu*, Tsun Wai Siu*, Rynson W.H. Lau Department of Computer Science, City University of Hong Kong |
| Pseudocode | No | The paper describes the proposed method using textual descriptions and figures, but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing its source code nor does it provide a direct link to a code repository for the proposed method. |
| Open Datasets | Yes | To facilitate weakly-supervised training and evaluation, we first construct a video mirror detection dataset, which contains 200 videos (12, 490 frames). Figure 2 shows some examples in our dataset.1 We discuss the details below. Video Collection. We collect 140 videos from two public datasets: 70 videos from the Charades (Sigurdsson et al. 2016) and 70 videos from the Charades-Ego (Sigurdsson et al. 2018), which record daily indoor activities. We capture 60 videos by ourselves using smartphones. We trim each video to have a duration of 5 8 seconds at 10 FPS. The total duration of our videos is 1, 252 seconds. Dataset Annotation. We randomly split our dataset into a training set of 150 videos (9,398 images) and a test set of 50 videos (3,092 images). We assign frame-level binary mirror indicators to the training set and annotate pixel-level mirror masks for the test set. We uniformly sample 20% frames from the training set to annotate mirror masks, for collecting dataset statistics and finetuning existing methods. 1https://drive.google.com/drive/folders/ 199Op Hu Hkmb Y4ib5TJKV m7rx N1Jgm HWI?usp=sharing. |
| Dataset Splits | No | The paper states: 'We randomly split our dataset into a training set of 150 videos (9,398 images) and a test set of 50 videos (3,092 images).' It does not explicitly mention a separate validation set split or how data for validation was partitioned. |
| Hardware Specification | Yes | We have implemented the proposed model under Pytorch (Paszke et al. 2017), and tested it on a PC with an i7 4GHz CPU and a GTX4090 GPU. |
| Software Dependencies | No | The paper mentions 'Pytorch (Paszke et al. 2017)' as the implementation framework. However, it does not specify a version number for PyTorch itself, nor does it list other software dependencies (e.g., Python, CUDA) with their specific version numbers. |
| Experiment Setup | Yes | The base learning rate, batch size, and the number of training epochs are 2e 4, 8, and 120, respectively, while the learning rate is reduced by 10 at the 90th epoch. Input frames are resized to 352 352. |