On the Role of Discrete Tokenization in Visual Representation Learning
Authors: Tianqi Du, Yifei Wang, Yisen Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first present the main empirical results of our proposed Cluster MIM methods on different real-world datasets with different backbones. Then we conduct a series of ablation experiments to discuss the selection of hyperparameters in Cluster MIM. |
| Researcher Affiliation | Academia | 1 National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2 School of Mathematical Sciences, Peking University 3 Institute for Artificial Intelligence, Peking University |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/PKU-ML/Cluster MIM. |
| Open Datasets | Yes | extensive experiments are conducted on Image Net-100 (Deng et al., 2009) and Image Net-1K (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions conducting "linear evaluation and non-linear fine-tuning" on the pretrained encoder and reports "fine-tuning accuracies" and "linear probing accuracies." However, it does not explicitly state the use of a separate validation set or specific splits for training/validation/test data to reproduce the experiment's data partitioning during model development or hyperparameter tuning. It only focuses on evaluation results. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments. It mentions training time but no hardware details. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | The mask ratio is set to 0.75. On both datasets, we pretrain the model for 200 epochs with batch size 4096 and weight decay 0.05. For the K-Means algorithm used in the tokenizer pretraining stage, we use K-Means++ initialization (Arthur & Vassilvitskii, 2007). We train K-Means for 100 epochs on Image Net-100 and 10 epochs on Image Net-1K. |