Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

QuadEnhancer: Leveraging Quadratic Transformations to Enhance Deep Neural Networks

Authors: Qian Chen, Linxin Yang, Akang Wang, Xiaodong Luo, Yin Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a set of proof-of-concept experiments for the proposed method across three tasks: image classification, text classification, and fine-tuning large-language models. In all tasks, the proposed approach demonstrates clear and substantial performance gains.
Researcher Affiliation Academia Qian Chen1,2, Linxin Yang2,3, Akang Wang2,3,*, Xiaodong Luo2,3, and Yin Zhang3,* 1School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China 2Shenzhen International Center for Industrial and Applied Mathematics, Shenzhen Research Institute of Big Data, China 3School of Data Science, The Chinese University of Hong Kong, Shenzhen, China
Pseudocode No No explicit pseudocode or algorithm blocks are provided. The methodology is described using mathematical equations and a workflow diagram (Figure 2).
Open Source Code Yes The experimental code is publicly available at https://github.com/chitar/Quad Enhancer.
Open Datasets Yes Our experiments begin with Image Net-1k for the initial pre-training stage. For downstream evaluation, we use six widely recognized benchmarks: Caltech [9], CIFAR-10, CIFAR-100 [20], Flowers [32], Food [2], and Pets [33]. For pre-training, we use the Wiki Text-2 dataset [30]... For downstream text classification, we utilize six standard benchmarks: IMDB (movie review sentiment analysis) [27], Yelp (restaurant review sentiment) [17], AG-News (topic classification) [49], SST-2 (Stanford Sentiment Treebank) [41], and Emotion (emotion recognition) [39]. We use several benchmark datasets including Bool Q, PIQA, SIQA, Hella Swag, Wino Grande, ARC-e, ARC-c, and OBQA. Detailed descriptions of these datasets are provided in the appendix of [16].
Dataset Splits Yes Following common practice, our experiments involve initial pre-training on a largescale dataset, subsequently fine-tuning the pre-trained models on various target datasets. Specifically, we first pre-train on Image Net-1k [20], then fine-tune and evaluate the models across several diverse downstream datasets. Each model is pre-trained on Wiki Text-2 for 20 epochs... Following pre-training, models are fine-tuned on each classification dataset for 10 epochs... We use several benchmark datasets including Bool Q, PIQA, SIQA, Hella Swag, Wino Grande, ARC-e, ARC-c, and OBQA.
Hardware Specification Yes All experiments were conducted using four NVIDIA A100 80GB.
Software Dependencies No The paper discusses FP16 precision and the LoRA algorithm, but does not provide specific software names with version numbers for libraries or frameworks used in the implementation.
Experiment Setup Yes The training parameters, including batch size, learning rate, number of epochs, and total training duration, are consistent with the settings outlined in [29]. Each model is pre-trained on Wiki Text-2 for 20 epochs, with batch size 128, learning rate 0.0001, and a maximum sequence length of 256 tokens... Following pre-training, models are fine-tuned on each classification dataset for 10 epochs, with learning rate 0.00005, batch size 16, and other optimizer settings unchanged. Our training configurations follow the established settings from prior works [16, 26].