Sparse Flows: Pruning Continuous-depth Models
Authors: Lucas Liebenwein, Ramin Hasani, Alexander Amini, Daniela Rus
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a diverse set of experiments demonstrating the effect of pruning on the generalization capability of continuous-depth models. |
| Researcher Affiliation | Academia | Lucas Liebenwein MIT CSAIL lucas@csail.mit.edu Ramin Hasani MIT CSAIL rhasani@mit.edu Alexander Amini MIT CSAIL amini@mit.edu Daniela Rus MIT CSAIL rus@csail.mit.edu |
| Pseudocode | Yes | Algorithm 1 SPARSEFLOW(f, Φtrain, PR, e) Input: f: neural ODE model with parameter set θ; Φtrain: hyper-parameters for training; PR: relative prune ratio; e: number of training epochs per prune-cycle. Output: f( ; ˆθ): Sparse Flow; m: sparse connection pattern. 1: θ0 RANDOMINIT() 2: θ TRAIN(θ0, Φtrain, e) Initial training stage with dense neural ODE ( warm start ). 3: m 1|θ0| Initialize binary mask indicating neural connection pattern. 4: while validation loss of Sparse Flow decreases do 5: m PRUNE(m θ, PR) Prune PR% of the remaining parameters and update mask. 6: θ TRAIN(m θ, Φtrain, e) Restart training with updated connection pattern. 7: end while 8: ˆθ m θ, and return f( ; ˆθ), m |
| Open Source Code | Yes | Code: https://github.com/lucaslie/torchprune |
| Open Datasets | Yes | We scale our experiments to a set of five real-world tabular datasets (prepared based on the instructions given by Papamakarios et al. (2017) and Grathwohl et al. (2019)) to verify our empirical observations about the effect of pruning on the generalizability of continuous normalizing flows. |
| Dataset Splits | No | Subsequently, we proceed by iteratively pruning and retraining the network until we either obtain the desired level of sparsity, i.e., prune ratio or when the loss for a pre-specified hold-out dataset (validation loss) starts to deteriorate (early stopping). |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are provided for the experimental setup in the paper. |
| Software Dependencies | No | We used two code bases (FFJORD from Grathwohl et al. (2019) and Torch Dyn (Poli et al., 2020a)) over which we implemented our pruning framework. |
| Experiment Setup | No | We use Adam with a fixed step learning decay schedule and weight decay in some instances. |