Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Understanding Nonlinear Implicit Bias via Region Counts in Input Space

Authors: Jingwei Li, Jing Xu, Zifan Wang, Huishuai Zhang, Jingzhao Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we found that small region counts align with geometrically simple decision boundaries and correlate well with good generalization performance. We also observe that good hyperparameter choices such as larger learning rates and smaller batch sizes can induce small region counts. We further establish the theoretical connections and explain how larger learning rate can induce small region counts in neural networks. ... We conduct image classification experiments on the CIFAR-10 dataset, using different architectures, including Res Net18 (He et al., 2016), Efficient Net B0 (Tan & Le, 2019), and Se Net18 (Hu et al., 2018). Results on other architectures are deferred to ablation studies. We vary the hyperparameters for training, such as learning rate, batch size and weight decay coefficient, whose numbers are reported in Table 1. The region count is calculated using randomly generated 1D hyperplanes, as described in Example 2. We run each experiment 100 times and report the average number. We plot the region count and generalization gap of different setups in Figure 4, and calculate the correlation between them.
Researcher Affiliation	Academia	1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Shanghai Qizhi Institute 3Wangxuan Institute of Computer Technology, Peking University. Correspondence to: Jingwei Li <EMAIL>, Jingzhao Zhang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Calculate the Number of Region
Open Source Code	Yes	Code is available at https://github.com/lijingwei0502/implicit_bias.
Open Datasets	Yes	We conduct image classification experiments on the CIFAR-10 dataset, using different architectures, including Res Net18 (He et al., 2016), Efficient Net B0 (Tan & Le, 2019), and Se Net18 (Hu et et al., 2018). ... We use CIFAR-10/100 (Krizhevsky et al., 2009) and Imagenet-1k (Deng et al., 2009) as datasets.
Dataset Splits	No	The paper mentions using well-known datasets like CIFAR-10/100 and ImageNet, which typically have standard splits. However, it does not explicitly state the specific percentages or counts for training, validation, and test sets, nor does it refer to specific split files or methodologies beyond mentioning 'random data crop and random flip' which are augmentation techniques, not split definitions.
Hardware Specification	Yes	We conduct all experiments using NVIDIA RTX 6000 graphics card.
Software Dependencies	No	The paper mentions using the Stochastic Gradient Descent (SGD) algorithm but does not specify any software libraries or frameworks (e.g., PyTorch, TensorFlow) with version numbers.
Experiment Setup	Yes	We vary the hyperparameters for training, such as learning rate, batch size and weight decay coefficient, whose numbers are reported in Table 1. ... We train three networks on the CIFAR-10 dataset, varying the batch sizes and learning rates. Our findings reveal that a smaller batch size or a higher learning rate results in smaller region counts, allowing the network to learn a simpler decision boundary and generalize better. ... For CIFAR-10 and CIFAR-100 dataset, each network was trained for 200 epochs using the Stochastic Gradient Descent (SGD) algorithm with cosine learning rate schedule. We choose 27 combinations of hyperparameters in Table 1, and for each hyperparameter we use 3 random seeds and report the average metrics. For the Imagenet-1k dataset, each network was trained for 50 epochs with random data crop and random flip. We use the same optimizer and 27 combinations of hyperparameters as in CIFAR-10 and CIFAR-100 experiments.