Sub-token ViT Embedding via Stochastic Resonance Transformers
Authors: Dong Lao, Yangchao Wu, Tian Yu Liu, Alex Wong, Stefano Soatto
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3. Experiments |
| Researcher Affiliation | Academia | Dong Lao 1 Yangchao Wu 1 Tian Yu Liu 1 Alex Wong 2 Stefano Soatto 1 1UCLA Vision Lab 2Yale Vision Lab. |
| Pseudocode | No | The paper describes the method and formalizes it with equations, but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Code: https://github.com/donglao/srt. |
| Open Datasets | Yes | We apply SRT to evaluate its performance using the DAVIS2017 video instance segmentation benchmark (Pont-Tuset et al., 2017). |
| Dataset Splits | No | We apply SRT to evaluate its performance using the DAVIS2017 video instance segmentation benchmark (Pont-Tuset et al., 2017). Our assessment utilizes the NYU-V2 dataset (Nathan Silberman & Fergus, 2012) under its original 640 480 resolution. Results on Cifar-10 classification with Res Net. |
| Hardware Specification | Yes | With Vi T-16/S architecture, on DAVIS-2017 (Pont-Tuset et al., 2017) our implementation of SRT runs at 1.0 seconds per image on a Nvidia 3090 GPU using a perturbation level of 3 pixels. |
| Software Dependencies | No | The paper mentions various models (e.g., CLIP, DINO, SAM, MAE) and frameworks (DINOV2, Token Cut, DPT) used or compared against, but does not provide specific version numbers for any software libraries, dependencies, or programming languages. |
| Experiment Setup | Yes | We employ SRT with a turbulence level of 7 pixels to traverse non-overlapping augmented tokens extensively. |