Learning 1D Causal Visual Representation with De-focus Attention Networks
Authors: Tao Chenxin, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate the efficacy of our approach, demonstrating that 1D causal visual representation can perform comparably to 2D non-causal representation in tasks such as global perception, dense prediction, and multi-modal understanding. |
| Researcher Affiliation | Collaboration | Chenxin Tao1,3 , Xizhou Zhu1,2*, Shiqian Su1,3*, Lewei Lu3, Changyao Tian4, Xuan Luo1, Gao Huang1, Hongsheng Li4, Yu Qiao2, Jie Zhou1, Jifeng Dai1,2 1Tsinghua University 2Shanghai Artificial Intelligence Laboratory 3Sense Time Research 4The Chinese University of Hong Kong |
| Pseudocode | No | The paper does not include a pseudocode block or an explicitly labeled algorithm section. |
| Open Source Code | No | Code shall be released. |
| Open Datasets | Yes | Image Net-1k [13] is used, which contains 1.28M images for training and 50K images for validation. |
| Dataset Splits | Yes | Image Net-1k [13] is used, which contains 1.28M images for training and 50K images for validation. |
| Hardware Specification | Yes | These models are trained on 32 Nvidia 80G A100 GPUs for 30 hours. |
| Software Dependencies | No | The paper mentions optimizers like Adam W but does not specify software versions for libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | The Adam W optimizer [40] with a peak learning rate of 5e-4, a total batch size of 1024, a momentum of 0.9, and a weight decay of 0.05 are used. These models are trained on 32 Nvidia 80G A100 GPUs for 30 hours. |