Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Generalization of Deep Neural Networks by Optimum Shifting

Authors: Yuyan Zhou, Ye Li, Lei Feng, Sheng-Jun Huang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments (including classification and detection) with various deep neural network architectures on benchmark datasets demonstrate the effectiveness of our method. Applying SOS to Trained Models Test Accuracy. We first evaluate SOS by applying it to trained deep models on CIFAR-10 and CIFAR-100 dataset
Researcher Affiliation Academia 1MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China. 2Information Systems Technology and Design Pillar, Singapore University of Technology and Design EMAIL, EMAIL
Pseudocode Yes Algorithm 1: SOS algorithm during training Input: Training set S 𝑛 𝑖=1{(𝒙𝑖, π’šπ‘–)}, batch size 𝑏1, 𝑏2 for SGD and SOS, step size 𝛾> 0. 1 for number of training epochs do 2 Sample batch B = {(𝒙1, π’š1), ...(𝒙𝑏2, π’šπ‘2)}; 3 Compute the input and output matrix ; 4 𝑨= π’™π‘˜,1, π’™π‘˜,2, , π’™π‘˜,𝑏2 𝒁= π‘½π‘‡π’™π‘˜,1, π‘½π‘‡π’™π‘˜,2, , π‘½π‘‡π’™π‘˜,𝑏2 ; 5 for each columns 𝑽𝑖in the final linear layer do 6 Conduct Gaussian elimination to make 𝑨 row-independent: [𝑨 , 𝒁𝑖 ] = Gaussian eliminate([𝑨, 𝒁𝑖]) ; 7 Update the parameters: 𝑽 𝑖= 𝑨 (𝑨 (𝑨 )𝑇) 1𝒁𝑖 ; 8 for 𝑑= 0, 1, , 𝑠do 9 Update all parameters using SGD; 10 W𝑑= W𝑑 1 𝛾1 𝑏1 Í𝑏1 𝑖=1 W𝑑 1 𝐿;
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using YOLOV5 (Jocher et al. 2020) but doesn't provide its own code.
Open Datasets Yes applying it to trained deep models on CIFAR-10 and CIFAR-100 dataset (Krizhevsky, Hinton et al. 2009)... Image Net classification dataset... PASCAL VOC dataset (Everingham et al. 2010).
Dataset Splits Yes applying it to trained deep models on CIFAR-10 and CIFAR-100 dataset (Krizhevsky, Hinton et al. 2009), which consists of 50k training images and 10k testing images in 10 and 100 classes.
Hardware Specification No The paper mentions 'High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics' in the acknowledgements, but does not specify any particular hardware details such as GPU/CPU models or memory used for the experiments.
Software Dependencies No The paper refers to using 'Py Torch official pretrained models' but does not provide specific software dependencies with version numbers for replication.
Experiment Setup Yes Following (Huang et al. 2017), the weight decay is 10 4 and a Nesterov momentum of 0.9 without damping. The batch size is set to 64 and the models are trained for 300 epochs. The initial learning rate is set to 0.1 and is reduced by a factor of 10 at 50% and 75% of the total training epochs.