Real-Valued Backpropagation is Unsuitable for Complex-Valued Neural Networks
Authors: Zhi-Hao Tan, Yi Xie, Yuan Jiang, Zhi-Hua Zhou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the experiments validate our theoretical findings numerically. |
| Researcher Affiliation | Academia | Zhi-Hao Tan, Yi Xie, Yuan Jiang, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {tanzh, xiey, jiangy, zhouzh}@lamda.nju.edu.cn |
| Pseudocode | No | The paper provides a conceptual "Definition 1 (Complex Tensor Program)" which describes how complex tensor programs are recursively generated, but it does not present a structured pseudocode block or algorithm. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | The third experiment investigates the convergence of difference between complex NTKs bΘ(n) t and real NTKs Θr during training as the widths go to infinity on MNIST [Le Cun et al., 1998]. |
| Dataset Splits | No | The paper mentions using a "training set D = (X, Y) (|D| = 128)" from MNIST but does not specify how this dataset was split into training, validation, and test subsets, nor does it mention cross-validation details. |
| Hardware Specification | No | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] The numerical experiments only aim to verify the theoretical results. |
| Software Dependencies | No | All empirical NTKs of complex networks are calculated based on the Neural Tangents library [Novak et al., 2019]. (No specific version is given for this or any other software component). |
| Experiment Setup | Yes | In NTK initialization, the standard deviations are set as 1 for complex networks and scaled to sqrt(2) for real networks. ... The learning rate η is 0.5 for l = 1 and 0.2 for l = 2. |