Towards Exact Gradient-based Training on Analog In-memory Computing
Authors: Zhaoxian Wu, Tayfun Gokmen, Malte Rasch, Tianyi Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The simulations verify the correctness of the analyses. 5 Numerical Simulations. In this section, we verify the main theoretical results by simulations on both synthetic datasets and real datasets. |
| Researcher Affiliation | Collaboration | Zhaoxian Wu Rensselaer Polytechnic Institute Troy, NY 12180 wuz16@rpi.edu Tayfun Gokmen IBM T. J. Watson Research Center Yorktown Heights, NY 10598 tgokmen@us.ibm.com |
| Pseudocode | No | The paper describes algorithms like Digital SGD, Analog SGD, and Tiki-Taka through mathematical equations and textual explanations, but does not include structured pseudocode blocks or figures explicitly labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | The code of our simulation implementation is available at github.com/Zhaoxian-Wu/analog-training. |
| Open Datasets | Yes | CIFAR10 dataset, MNIST dataset, We train Fullyconnected network (FCN) and convolution neural network (CNN) models on MNIST dataset, We also train three Resnet models with different sizes on CIFAR10 dataset. |
| Dataset Splits | No | The paper does not explicitly state train/test/validation dataset splits (e.g., percentages, absolute counts, or references to predefined splits). It mentions 'The batch size is 10 for all algorithms.' and 'The batch size is 8 for all algorithms.' and 'The batch size is 128 for all algorithms.' but this is about batching, not data splitting. |
| Hardware Specification | Yes | We conduct our experiments on an NVIDIA RTX 3090 GPU, which has 24GB memory and maximum power 350W. |
| Software Dependencies | No | We use the PYTORCH to generate the curves for SGD in the simulation and use open source toolkit IBM Analog Hardware Acceleration Kit (AIHWKIT) [27] to simulate the behaviors of Analog SGD; see github.com/IBM/aihwkit. While tools are mentioned, specific version numbers for these software components are not provided (e.g., PyTorch 1.x, AIHWKIT 0.x). |
| Experiment Setup | Yes | The learning rates are α = 0.1 for SGD, α = 0.05, β = 0.01 for Analog SGD or Tiki-Taka. The batch size is 10 for all algorithms. The learning rates are set as α = 0.1 for digital SGD, α = 0.05, β = 0.01 for Analog SGD or Tiki-Taka. The batch size is 8 for all algorithms. The learning rates are set as α = 0.15 for digital SGD, α = 0.075, β = 0.01 for Analog SGD or Tiki-Taka. The batch size is 128 for all algorithms. |