Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Switchable Token-Specific Codebook Quantization For Face Image Compression

Authors: Yongbo Wang, Haonan Wang, Guodong Mu, Ruixin Zhang, Jiaqi Chen, Jingyun Zhang, Jun Wang, Yuan Xie, zhizhong zhang, Shouhong Ding

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments Dataset. We train our models on the CASIA-Web Face dataset (48), a large-scale face image dataset widely used in the field of face recognition research, which contains approximately 500K images of 10,575 individuals, collected from the Internet. In the test stage, we evaluate the reconstruction quality of our method on five face recognition datasets: LFW (2), CFP-FP (49), Age DB (50), CPLFW (51), CALFW (52). They all have 6-7K pairs of images that are used to determine whether they belong to the same person. 4.2 Main Results We evaluate our proposed method on two representative baselines, Ti Tok (7) and VQGAN (36). For Ti Tok, we conduct experiments under two different scales, where each image is represented by either 128 or 32 discrete indices.
Researcher Affiliation Collaboration Yongbo Wang East China Normal University Shanghai, China Haonan Wang Tencent Youtu Lab Shanghai, China Guodong Mu Tencent Youtu Lab Shanghai, China Ruixin Zhang Tencent Youtu Lab Shanghai, China Jiaqi Chen East China Normal University Shanghai, China Jingyun Zhang Tencent We Chat Pay Lab33 Shenzhen, China Jun Wang Tencent We Chat Pay Lab33 Shenzhen, China Yuan Xie East China Normal University Shanghai, China Zhizhong Zhang East China Normal University Shanghai, China Shouhong Ding Tencent Youtu Lab Shanghai, China EMAIL, EMAIL EMAIL
Pseudocode No The paper describes methods using mathematical formulations (e.g., equations 1-16) and flow diagrams (Figure 2, 3, 5, 6, 7), but it does not contain any explicit pseudocode or algorithm blocks with structured steps.
Open Source Code No Justification: We have thoroughly introduced our method in the Method section and detailed our training process in the Experiments section. Additionally, we plan to release our code and checkpoints. Justification: We will submit our codes.
Open Datasets Yes Dataset. We train our models on the CASIA-Web Face dataset (48), a large-scale face image dataset widely used in the field of face recognition research, which contains approximately 500K images of 10,575 individuals, collected from the Internet. In the test stage, we evaluate the reconstruction quality of our method on five face recognition datasets: LFW (2), CFP-FP (49), Age DB (50), CPLFW (51), CALFW (52). They all have 6-7K pairs of images that are used to determine whether they belong to the same person.
Dataset Splits No The paper mentions training on the CASIA-Web Face dataset and evaluating on LFW, CFP-FP, Age DB, CPLFW, and CALFW. It states that evaluation datasets have "6-7K pairs of images that are used to determine whether they belong to the same person," which implies how they are used for testing, but it does not specify explicit training/validation/test splits or percentages for the CASIA-Web Face dataset or how the evaluation datasets are partitioned beyond their test usage.
Hardware Specification Yes Our methods are implemented on eight NVIDIA V100 GPUs with nearly 2 days for training.
Software Dependencies No The paper mentions implementing methods and using AdamW for optimization but does not provide specific version numbers for any software libraries, frameworks, or programming languages used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes Training Details. We adopt our Switchable Token-Specific Codebook Quantization on both CNNbased and Vi T-based VQ-tokenizers. In our training pipeline, the encoder remains fixed throughout the process. During stage 1 and stage 2, only the codebook is learnable, with its initial size set to 4096. In stage 3, only the decoder is trained to adapt to the quantized representations produced by the updated codebook. For the training dataset CASIA-Web Face, we train 100K steps for stage 1, 400K steps for stage 2, and 100K steps for stage 3. Our models are optimized by Adam W with the initial learning rate of 1e 4. Our methods are implemented on eight NVIDIA V100 GPUs with nearly 2 days for training.