Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)
Authors: Jie Bu, Arka Daw, M. Maruf, Anuj Karpatne
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our proposed DAM approach has remarkably good performance over a diverse range of applications in representation learning and structured pruning, including dimensionality reduction, recommendation system, graph representation learning, and structured pruning for image classification. We also theoretically show that the learning objective of DAM is directly related to minimizing the L0 norm of the masking layer. |
| Researcher Affiliation | Academia | Jie Bu Virginia Tech jayroxis@vt.edu Arka Daw Virginia Tech darka@vt.edu M. Maruf Virginia Tech marufm@vt.edu Anuj Karpatne Virginia Tech karpatne@vt.edu |
| Pseudocode | No | The paper describes the mathematical formulation and conceptual steps of DAM, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | All of our codes and datasets are available https://github.com/jayroxis/ dam-pytorch. |
| Open Datasets | Yes | We evaluate the effectiveness of DAM in recovering the embeddings of synthetically generated dimensionality reduction problems. ... We train our IGMC-DAM model for 100 epochs under the same training configurations. ... We further evaluate DAM on the problem of dimensionality reduction using simple auto-encoders on the MNIST dataset... We evaluate the performance of these methods on the Pre Resnet-164 architecture on benchmark computer vision datasets, CIFAR-10 and CIFAR-100. |
| Dataset Splits | Yes | We observe that the training cross-entropy (CE) loss and the validation CE loss are very similar to what we expect from a Preresnet model trained on benchmark vision datasets with learning rate changes at 100 and 150 epochs, respectively. ... Net-Slim also does not involve additional pruning after training. However, Chip Net involves 20 epochs of pruning which is significant and is almost comparable to its pretraining time. Finally, comparing the total running time taken by each of the structured pruning methods, we observe that DAM is significantly faster than the current SOTA counterparts owing to its single-stage pruning approach. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions that the code is available via a PyTorch link (dam-pytorch), implying the use of PyTorch, but it does not specify version numbers for PyTorch, Python, CUDA, or any other software dependencies. |
| Experiment Setup | Yes | In all of our experiments, we used αi = 1 and k = 5. In our structured network pruning experiments, we used a cold-start of 20 epochs (i.e., the βi s were frozen for the duration of cold-start), so as to allow the leftmost neurons to undergo some epochs of refinement before beginning the pruning process. We also set the initial value of βi to 1, which can also be thought of as another form of cold-starting (since pruning of a layer only starts when βi becomes less than zero). ... We replaced the dropout layer with a DAM layer in the IGMC to reduce the dimension of the learned representations. We train our IGMC-DAM model for 100 epochs under the same training configurations. ... learning rate changes at 100 and 150 epochs |