DenseDINO: Boosting Dense Self-Supervised Learning with Token-Based Point-Level Consistency
Authors: Yike Yuan, Xinghe Fu, Yunlong Yu, Xi Li
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiment We evaluate our method on both classification and semantic segmentation tasks. Table 1: Comparison (%) with other methods. 4.3 Ablation Study |
| Researcher Affiliation | Academia | Yike Yuan1 , Xinghe Fu1 , Yunlong Yu2 and Xi Li13 1College of Computer Science and Technology, Zhejiang University 2College of Information Science and Electronic Engineering, Zhejiang University 3Zhejiang Singapore Innovation and AI Joint Research Lab, Hangzhou {yuanyike, xinghefu, yuyunlong, xilizju}@zju.edu.cn |
| Pseudocode | No | The paper does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its own source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We choose Vi T[Dosovitskiy et al., 2021] as the backbone and Image Net[Russakovsky et al., 2015] as the training dataset. ... For semantic segmentation, we adopt linear probing protocol following [Ziegler and Asano, 2022]. We train a 1 1 convolutional layer on the frozen patch tokens on Pascal VOC 2012[Everingham et al., 2010] train + aug split and report m IOU on the valid split. |
| Dataset Splits | Yes | We train a 1 1 convolutional layer on the frozen patch tokens on Pascal VOC 2012[Everingham et al., 2010] train + aug split and report m IOU on the valid split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper mentions various models and frameworks (e.g., ViT, DINO, Leopart) but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | We train Vi T-Small with 4 views and 4 reference points in each pair of views for 300 epochs for the performance comparison in main results, and Vi T-Tiny with 6 views and 4 reference points for 100 epochs for the ablation study. The loss weight α is set as 0.5. ... All models in the experiment are with patch size 16 and trained from scratch unless specified otherwise. Other training parameters are kept the same with the setting of DINO. |