Sparsity in Continuous-Depth Neural Networks

Authors: Hananeh Aliee, Till Richter, Mikhail Solonin, Ignacio Ibarra, Fabian Theis, Niki Kilbertus

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive empirical evaluation on these challenging benchmarks suggests that weight sparsity improves generalization in the presence of noise or irregular sampling. However, it does not prevent learning spurious feature dependencies in the inferred dynamics, rendering them impractical for predictions under interventions, or for inferring the true underlying dynamics. Instead, feature sparsity can indeed help with recovering sparse ground-truth dynamics compared to unregularized NODEs.
Researcher Affiliation Collaboration Hananeh Aliee Helmholtz Munich Till Richter Helmholtz Munich Mikhail Solonin Technical University of Munich Ignacio Ibarra Helmholtz Munich Fabian Theis Technical University of Munich Helmholtz Munich Niki Kilbertus Technical University of Munich Helmholtz AI, Munich {hananeh.aliee,till.richter,ignacio.ibarra,fabian.theis,niki.kilbertus} @helmholtz-muenchen.de Work done while at TUM. MS is currently employed by J.P. Morgan Chase & Co.; mikhail.solonin@jpmorgan.com
Pseudocode No The paper does not contain any explicit pseudocode blocks or sections labeled “Algorithm”.
Open Source Code Yes 2The python implementation is available at: https://github.com/theislab/Path Reg
Open Datasets Yes We curate large, real-world datasets consisting of human motion capture (mocap.cs.cmu.edu) as well as human hematopoiesis single-cell RNA-seq [32] data for our empirical evaluations.
Dataset Splits No The paper mentions training and testing data but does not explicitly specify a separate validation dataset split or how hyperparameters were tuned using a validation set. Section A states: “The evaluation of models are done on the test set, after models are trained,” implying direct training on a portion and testing on another, without a dedicated validation split for hyperparameter tuning.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts. While the ethics statement confirms that compute resources were included, it only vaguely refers to “type of GPUs, internal cluster, or cloud provider” without any specific models or configurations.
Software Dependencies No The paper mentions using “Adam optimizer [23]” and that “The python implementation is available at” (footnote 2). However, it does not specify version numbers for Python, Adam, or any other software libraries or dependencies (e.g., PyTorch, TensorFlow, scikit-learn), which are crucial for reproducibility.
Experiment Setup Yes A detailed description of our training procedures and architecture choices for each experiment is provided in Appendix A. We train all models for 500 epochs using the Adam optimizer [23] with a learning rate of 1e−2 and weight decay of 1e−5. We use a batch size of 20.