Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
Authors: Zhengqiang ZHANG, Rongyuan Wu, Lingchen Sun, Lei Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the state-of-the-art performance of GPSToken, which achieves r FID and FID scores of 0.65 and 1.50 on image reconstruction and generation tasks using 128 tokens, respectively. Codes and models of GPSToken can be found at https://github.com/xtudbxk/GPSToken. Our method achieves significant improvements over existing methods in both image reconstruction and generation tasks. For image reconstruction, GPSToken achieves rec. FID , PSNR and SSIM scores of 0.65, 24.06 and 0.657 on the Image Net 256 256 reconstruction task using 128 tokens. For image generation, our model achieves a state-of-the-art FID of 1.50 on the Image Net 256 generation task, surpassing recent methods such as Titok [37], Flex Tok [1], One-D-Piece [24] and MAETok [5]. |
| Researcher Affiliation | Collaboration | Zhengqiang Zhang1,2, Rongyuan Wu1,2, Lingchen Sun1,2, Lei Zhang1,2, 1The Hong Kong Polytechnic University 2OPPO Research Institute |
| Pseudocode | Yes | Algorithm 1: Spatially-adaptive Token Initialization Algorithm Input: image I; target token count l; metric hyper-parameter λ; minimal size of region smin. Output: regions list L; initialized Gaussian parameters {ginit 0 , , ginit l 1}. [...] Algorithm 2: Gaussian Calibration Algorithm Input: predicted Gaussian parameters {g0, , gl 1}; minimal size of region smin; image size W H. Output: calibrated Gaussian parameters {gcal 0 , , gcal l 1}. |
| Open Source Code | Yes | Codes and models of GPSToken can be found at https://github.com/xtudbxk/GPSToken. |
| Open Datasets | Yes | We train all models on the Image Net dataset [30], which contains 1.28M training images and 50K validation images. [...] We further evaluate on additional datasets: COCO2017 [22] (natural images), FFHQ [17] (faces), STARE [15] (medical images), and WHU_RS19 [2] (remote sensing). |
| Dataset Splits | Yes | We train all models on the Image Net dataset [30], which contains 1.28M training images and 50K validation images. [...] We conduct all evaluations on the validation set of Image Net. |
| Hardware Specification | Yes | All experiments are conducted on eight A100 GPUs. |
| Software Dependencies | No | The paper mentions 'Adam optimizer [19]', 'Si T [23]', 'Si T-XL/2 [23]', but does not provide specific version numbers for software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | For the image reconstruction task, we train the encoder-decoder framework for 1M steps with a batch size of 96. The model is first trained using only the reconstruction loss Lrec for the initial 600K steps. Subsequently, the perceptual loss Lperc and adversarial loss Ladv [11] are incorporated for the remaining 400K steps to enhance texture details. We use the Adam optimizer [19] with a fixed learning rate of 5 10 5. Additionally, we apply an exponential moving average (EMA) with a decay rate of 0.9999 to stabilize the training process. We set s = 5 for Eq. 2 (in the main paper), λ = 2.5 for Eq. 4 (in the main paper) and smin = 4 for Algorithm 1. For the image generation task, we adopt the velocity matching loss from Si T [23] and train the layout and conditional texture generators sequentially. Specifically, the layout synthesis model is trained for 500K iterations, and the conditional layout-to-texture generation model for 4M iterations. To mitigate overfitting to the conditions, we add 0.5 Gaussian noise to the condition during training of the conditional texture generator. Both models are trained with a batch size of 256 and a learning rate of 1 10 4 using the Adam optimizer. |