Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the Role of Discrete Tokenization in Visual Representation Learning
Authors: Tianqi Du, Yifei Wang, Yisen Wang
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first present the main empirical results of our proposed Cluster MIM methods on different real-world datasets with different backbones. Then we conduct a series of ablation experiments to discuss the selection of hyperparameters in Cluster MIM. |
| Researcher Affiliation | Academia | 1 National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2 School of Mathematical Sciences, Peking University 3 Institute for Artificial Intelligence, Peking University |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/PKU-ML/Cluster MIM. |
| Open Datasets | Yes | extensive experiments are conducted on Image Net-100 (Deng et al., 2009) and Image Net-1K (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions conducting "linear evaluation and non-linear fine-tuning" on the pretrained encoder and reports "fine-tuning accuracies" and "linear probing accuracies." However, it does not explicitly state the use of a separate validation set or specific splits for training/validation/test data to reproduce the experiment's data partitioning during model development or hyperparameter tuning. It only focuses on evaluation results. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments. It mentions training time but no hardware details. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | The mask ratio is set to 0.75. On both datasets, we pretrain the model for 200 epochs with batch size 4096 and weight decay 0.05. For the K-Means algorithm used in the tokenizer pretraining stage, we use K-Means++ initialization (Arthur & Vassilvitskii, 2007). We train K-Means for 100 epochs on Image Net-100 and 10 epochs on Image Net-1K. |