Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning High-Degree Parities: The Crucial Role of the Initialization
Authors: Emmanuel Abbe, Elisabetta Cornacchia, Jan Hązła, Donald Kougang Yombi
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we show that the choice of initialization can be critical when learning complex functions, such as high-degree parities... We observe that the test accuracy after training decreases as σ increases. This pattern is seen in both the online setting (left plot), where fresh batches are sampled at each iteration, and the offline setting (right plot), where the network is trained on a fixed dataset until the training loss decreases to 10-2, and evaluated on a separate test set. For input dimension d = 50, as in Figure 1, we find that some learning occurs for σ ∈ {0.1, 0.2}. However, in the Appendix, we report experiments with larger input dimensions, where learning does not occur for these values of σ (Figure5). |
| Researcher Affiliation | Academia | Emmanuel Abbe Ecole Polytechnique Fédérale de Lausanne (EPFL) Elisabetta Cornacchia INRIA Paris, DI ENS, PSL Massachusetts Institute of Technology (MIT) Jan Hazła & Donald Kougang-Yombi African Institute for Mathematical Sciences (AIMS), Kigali, Rwanda Email: EMAIL, EMAIL, EMAIL. |
| Pseudocode | No | The paper describes the noisy-SGD algorithm in Definition 3 using mathematical notation and text, but it does not present it as a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide any links to code repositories. |
| Open Datasets | No | We focus on learning parity functions on uniform inputs (D = Unif{-1}^d). The inputs are sampled from the uniform distribution on {-1}^d. (Section 3, Section 4.1). This indicates the use of synthetically generated data rather than a publicly available dataset with a specific link, DOI, or repository. |
| Dataset Splits | No | In the online setting, we sample fresh batches of samples at each iteration. In the offline setting, we sample batches from a fixed dataset and we stop training when the training loss is less than 0.01. (Appendix E.1). While it mentions 'offline fixed dataset' and evaluation on a 'separate test set' (Section 6), specific percentages, sample counts, or predefined split details are not provided in the paper. |
| Hardware Specification | Yes | All experiments were performed using the PyTorch framework (Paszke et al. (2019)) and they were executed on NVIDIA Volta V100 GPUs. |
| Software Dependencies | No | All experiments were performed using the PyTorch framework (Paszke et al. (2019)). (Appendix E.1). PyTorch is mentioned, but a specific version number is not provided. |
| Experiment Setup | Yes | We train the architectures using SGD with batch size 64. In the online setting, we sample fresh batches of samples at each iteration. In the offline setting, we sample batches from a fixed dataset and we stop training when the training loss is less than 0.01. We tried different batch sizes and learning rates, and we did not observe significant qualitative difference. We chose to report the experiments obtained for a standard batch size of 64 and a learning rate of 0.01. |