Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks

Authors: Jialin Zhao, Yingtao Zhang, Xinghang Li, Huaping Liu, Carlo Vittorio Cannistraci

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive testing on both Euclidean and hyperbolic neural networks across various tasks, SST demonstrates its ability to outperform existing memory reduction training methods and is comparable to full-rank training in various cases. On LLa MA-1.3B, with only 18.7% of the parameters trainable compared to full-rank training (using a rank equivalent to 6% of the embedding dimension), SST reduces the perplexity gap between other low-rank methods and fullrank training by 97.4%. This result highlights SST as an effective parameter-efficient technique for model pre-training. Our comprehensive evaluations cover different tasks, including pre-training large language models on OPT model family, ranging from 125m to 1.3b (Zhang et al., 2022), using Transformer (Vaswani et al., 2017) for machine translation tasks and hyperbolic graph neural networks (Chen et al., 2022) on node classification and link prediction tasks.
Researcher Affiliation Academia 1Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Psychological and Cognitive Sciences 2Department of Computer Science 3Department of Biomedical Engineering, Tsinghua University, China. Correspondence to: Jialin Zhao <EMAIL>, Carlo Vittorio Cannistraci <EMAIL>.
Pseudocode Yes Algorithm 1 Re Lo RA* Algorithm 2 Sparse Spectral Training (SST)
Open Source Code Yes Our code is available at https: //github.com/biomedical-cybernetics/ sparse-spectral-training.
Open Datasets Yes We employ the vanilla transformer (Vaswani et al., 2017) as the Euclidean transformer and Hybo Net (Chen et al., 2022) as the hyperbolic transformer. Our experiments include three widely-used machine translation datasets: IWSLT 14 English-to-German (Cettolo et al., 2014), IWSLT 17 German-to-English (Cettolo et al., 2017), and Multi30K German-to-English (Elliott et al., 2016). Language modeling. We utilize the OPT (Zhang et al., 2022) and LLa MA (Touvron et al., 2023a) architecture as the baseline for our language generation experiments. For LLa MA, we follow the experiment setup from (Zhao et al., 2024). All models are pre-trained on Open Web Text (Gokaslan & Cohen, 2019), an open-source reproduction of Open AI s Web Text. We evaluated the effectiveness of SST on Hybo Net (Chen et al., 2022) version of HGNN in node classification and link prediction across four distinct datasets: Airport (Chami et al., 2019), Cora (Sen et al., 2008), Disease (Anderson & May, 1991), and Pub Med (Namata et al., 2012). We conduct additional experiments on image classification tasks using MLP-based models... on three datasets: MNIST (Lecun et al., 1998), EMNIST (Cohen et al., 2017), and Fashion MNIST (Xiao et al., 2017). To further evaluate the performance of SST, we conducted additional experiments using larger datasets and varied hyperparameter settings. Specifically, we pre-trained LLa MA-130M on the C4 dataset (Raffel et al., 2020), which is about 25 times larger than Open Web Text.
Dataset Splits Yes For IWSLT 14, the hyperparameters are aligned with those from Hybo Net. For LLa MA, we follow the experiment setup from (Zhao et al., 2024). Table 2 displays the validation perplexity results on the Open Web Text dataset across different sizes of all LLMs. Each pretrained model performs zero-shot evaluations on all 16 NLP tasks used in the OPT article (Zhang et al., 2022). All datasets were evaluated based on test accuracy.
Hardware Specification Yes Experiments were conducted on one A100 GPU. Distributed training is facilitated using the Accelerate (Gugger et al., 2022) library across four A100 GPUs on a Linux server.
Software Dependencies Yes However, standard implementations of Adam optimizer (Kingma & Ba, 2014) in Py Torch (Paszke et al., 2019) do not support sparse optimizer states. Distributed training is facilitated using the Accelerate (Gugger et al., 2022) library across four A100 GPUs on a Linux server. We employ the same codebase and hyperparameters as those used in Hybo Net (Chen et al., 2022), which is derived from Open NMT-py (Klein et al., 2017).
Experiment Setup Yes Hyperparameters and implementation details are provided in Appendix E. Table 8: Hyperparameters on IWSLT 14 for Euclidean and hyperbolic Transformer. Table 9: Hyperparameters on Multi30K and IWSLT 17 for vanilla Transformer. Table 10: Hyperparameters for OPT Models Table 11: Hyperparameters for LLa MA Models