reproducibilityindex.ai

On the Role of Discrete Tokenization in Visual Representation Learning

Authors: Tianqi Du, Yifei Wang, Yisen Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we first present the main empirical results of our proposed Cluster MIM methods on different real-world datasets with different backbones. Then we conduct a series of ablation experiments to discuss the selection of hyperparameters in Cluster MIM.
Researcher Affiliation	Academia	1 National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2 School of Mathematical Sciences, Peking University 3 Institute for Artificial Intelligence, Peking University
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/PKU-ML/Cluster MIM.
Open Datasets	Yes	extensive experiments are conducted on Image Net-100 (Deng et al., 2009) and Image Net-1K (Deng et al., 2009).
Dataset Splits	No	The paper mentions conducting "linear evaluation and non-linear fine-tuning" on the pretrained encoder and reports "fine-tuning accuracies" and "linear probing accuracies." However, it does not explicitly state the use of a separate validation set or specific splits for training/validation/test data to reproduce the experiment's data partitioning during model development or hyperparameter tuning. It only focuses on evaluation results.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments. It mentions training time but no hardware details.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	The mask ratio is set to 0.75. On both datasets, we pretrain the model for 200 epochs with batch size 4096 and weight decay 0.05. For the K-Means algorithm used in the tokenizer pretraining stage, we use K-Means++ initialization (Arthur & Vassilvitskii, 2007). We train K-Means for 100 epochs on Image Net-100 and 10 epochs on Image Net-1K.