Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scaling Up Parameter Generation: A Recurrent Diffusion Approach

Authors: Kai Wang, Dongwen Tang, Wangbo Zhao, Konstantin Schürholt, Zhangyang "Atlas" Wang, Yang You

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluation on Image-Net classification, ADE20K segmentation, COCO detection, and commonsense reasoning show that RPG consistently produces weights on par with trained models. Notably: Single-GPU feasibility. Our recurrent design and parameter-token strategy enable inference on a single commodity GPU beyond 100M weights. Generalization to unseen tasks. RPG can generate high-performing neural network parameters for unseen binary classification tasks on CIFAR-10, providing evidence for its broader generalization capabilities. Architectural versatility. Our method supports diverse families such as ResNet, Vision Transformer, ConvNeXt, and LoRA-based language models, making it a unified framework for parameter generation.
Researcher Affiliation Academia Kai Wang National University of Singapore EMAIL Dongwen Tang National University of Singapore EMAIL Wangbo Zhao National University of Singapore EMAIL Konstantin Schürholt University of St. Gallen EMAIL Zhangyang Wang University of Texas at Austin EMAIL Yang You National University of Singapore EMAIL
Pseudocode No The paper describes the method verbally and with diagrams (Fig. 2), but no explicit pseudocode block or algorithm is presented in a structured format.
Open Source Code Yes Code: https://github.com/NUS-HPC-AI-Lab/Recurrent-Parameter-Generation
Open Datasets Yes Datasets and architectures. We evaluate our method across various tasks, including ImageNet-1K [13] for theclassification, ADE20K [77] for the semantic segmentation, COCO [40] for the object detection, and BoolQ [10], PIQA [5], SIQA [54], HellaSwag [75], and ARC [11] for the commonsense reasoning tasks. To verify the scalability, we conduct experiments on various architectures with parameter counts ranging from several to hundred million.
Dataset Splits Yes To assess RPG’s capability for unseen tasks, we construct various binary classification tasks on CIFAR-10. As shown in Fig. 4, we encode each task as a 10-bit binary embedding, where each bit corresponds to a category. ... we obtain 1022 valid embeddings. For each one, we collect its corresponding parameters, forming embedding-parameter pairs. These pairs are then split into non-overlapping sets for training and validation, allowing us to evaluate RPG’s generality on unseen tasks. ... Of these tasks, 1002 randomly selected embedding-parameter pairs serve as the training set (seen tasks), while the remaining pairs are reserved as unseen tasks for evaluation.
Hardware Specification Yes All results are obtained with a single NVIDIA H100 80G GPU. Our approach shows the capability to generate models within minutes. Notably, even for ConvNeXt-L (197.7 M parameters), we can synthesize the entire parameter within 1.3 minutes. ... Meanwhile, the inference memory requirement is approximately 20GB, so RPG can be deployed on NVIDIA RTX 3090 or similar-level GPUs.
Software Dependencies No B.1 Training recipe ... mixed precision bfloat16 bfloat16 ... The paper mentions 'bfloat16' but does not list specific software libraries or their version numbers like Python, PyTorch, or CUDA.
Experiment Setup Yes Preprocessing and training details. The length of parameter tokens, permutation states, position embeddings, and prototypes is set to 8192. It is worth noting that the permutation states and position embeddings are fixed during the training. We default to using Mamba [23] as the architecture of the recurrent model. More details can be found in Appendix B. Inference details. We input permutation states (relevant experimental results are in Appendix C.1) and position embeddings into the recurrent model to generate the prototypes. Then, the diffusion model utilizes the prototypes as conditions, along with random noises, to synthesize the entire network parameters. We repeat the above process 10 times and report the best, average, and minimum. Appendix B.1 Training recipe provides: batch size, optimizer Adam W, learning rate, training steps, weight decay, mixed precision bfloat16, diffusion batch size. Appendix B.2 Detailed structure of recurrent diffusion provides specific parameters for the recurrent and diffusion models.