Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Understanding Nonlinear Implicit Bias via Region Counts in Input Space
Authors: Jingwei Li, Jing Xu, Zifan Wang, Huishuai Zhang, Jingzhao Zhang
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we found that small region counts align with geometrically simple decision boundaries and correlate well with good generalization performance. We also observe that good hyperparameter choices such as larger learning rates and smaller batch sizes can induce small region counts. We further establish the theoretical connections and explain how larger learning rate can induce small region counts in neural networks. ... We conduct image classification experiments on the CIFAR-10 dataset, using different architectures, including Res Net18 (He et al., 2016), Efficient Net B0 (Tan & Le, 2019), and Se Net18 (Hu et al., 2018). Results on other architectures are deferred to ablation studies. We vary the hyperparameters for training, such as learning rate, batch size and weight decay coefficient, whose numbers are reported in Table 1. The region count is calculated using randomly generated 1D hyperplanes, as described in Example 2. We run each experiment 100 times and report the average number. We plot the region count and generalization gap of different setups in Figure 4, and calculate the correlation between them. |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Shanghai Qizhi Institute 3Wangxuan Institute of Computer Technology, Peking University. Correspondence to: Jingwei Li <EMAIL>, Jingzhao Zhang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Calculate the Number of Region |
| Open Source Code | Yes | Code is available at https://github.com/lijingwei0502/implicit_bias. |
| Open Datasets | Yes | We conduct image classification experiments on the CIFAR-10 dataset, using different architectures, including Res Net18 (He et al., 2016), Efficient Net B0 (Tan & Le, 2019), and Se Net18 (Hu et et al., 2018). ... We use CIFAR-10/100 (Krizhevsky et al., 2009) and Imagenet-1k (Deng et al., 2009) as datasets. |
| Dataset Splits | No | The paper mentions using well-known datasets like CIFAR-10/100 and ImageNet, which typically have standard splits. However, it does not explicitly state the specific percentages or counts for training, validation, and test sets, nor does it refer to specific split files or methodologies beyond mentioning 'random data crop and random flip' which are augmentation techniques, not split definitions. |
| Hardware Specification | Yes | We conduct all experiments using NVIDIA RTX 6000 graphics card. |
| Software Dependencies | No | The paper mentions using the Stochastic Gradient Descent (SGD) algorithm but does not specify any software libraries or frameworks (e.g., PyTorch, TensorFlow) with version numbers. |
| Experiment Setup | Yes | We vary the hyperparameters for training, such as learning rate, batch size and weight decay coefficient, whose numbers are reported in Table 1. ... We train three networks on the CIFAR-10 dataset, varying the batch sizes and learning rates. Our findings reveal that a smaller batch size or a higher learning rate results in smaller region counts, allowing the network to learn a simpler decision boundary and generalize better. ... For CIFAR-10 and CIFAR-100 dataset, each network was trained for 200 epochs using the Stochastic Gradient Descent (SGD) algorithm with cosine learning rate schedule. We choose 27 combinations of hyperparameters in Table 1, and for each hyperparameter we use 3 random seeds and report the average metrics. For the Imagenet-1k dataset, each network was trained for 50 epochs with random data crop and random flip. We use the same optimizer and 27 combinations of hyperparameters as in CIFAR-10 and CIFAR-100 experiments. |