Towards Exact Gradient-based Training on Analog In-memory Computing

Authors: Zhaoxian Wu, Tayfun Gokmen, Malte Rasch, Tianyi Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The simulations verify the correctness of the analyses. 5 Numerical Simulations. In this section, we verify the main theoretical results by simulations on both synthetic datasets and real datasets.
Researcher Affiliation Collaboration Zhaoxian Wu Rensselaer Polytechnic Institute Troy, NY 12180 wuz16@rpi.edu Tayfun Gokmen IBM T. J. Watson Research Center Yorktown Heights, NY 10598 tgokmen@us.ibm.com
Pseudocode No The paper describes algorithms like Digital SGD, Analog SGD, and Tiki-Taka through mathematical equations and textual explanations, but does not include structured pseudocode blocks or figures explicitly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code Yes The code of our simulation implementation is available at github.com/Zhaoxian-Wu/analog-training.
Open Datasets Yes CIFAR10 dataset, MNIST dataset, We train Fullyconnected network (FCN) and convolution neural network (CNN) models on MNIST dataset, We also train three Resnet models with different sizes on CIFAR10 dataset.
Dataset Splits No The paper does not explicitly state train/test/validation dataset splits (e.g., percentages, absolute counts, or references to predefined splits). It mentions 'The batch size is 10 for all algorithms.' and 'The batch size is 8 for all algorithms.' and 'The batch size is 128 for all algorithms.' but this is about batching, not data splitting.
Hardware Specification Yes We conduct our experiments on an NVIDIA RTX 3090 GPU, which has 24GB memory and maximum power 350W.
Software Dependencies No We use the PYTORCH to generate the curves for SGD in the simulation and use open source toolkit IBM Analog Hardware Acceleration Kit (AIHWKIT) [27] to simulate the behaviors of Analog SGD; see github.com/IBM/aihwkit. While tools are mentioned, specific version numbers for these software components are not provided (e.g., PyTorch 1.x, AIHWKIT 0.x).
Experiment Setup Yes The learning rates are α = 0.1 for SGD, α = 0.05, β = 0.01 for Analog SGD or Tiki-Taka. The batch size is 10 for all algorithms. The learning rates are set as α = 0.1 for digital SGD, α = 0.05, β = 0.01 for Analog SGD or Tiki-Taka. The batch size is 8 for all algorithms. The learning rates are set as α = 0.15 for digital SGD, α = 0.075, β = 0.01 for Analog SGD or Tiki-Taka. The batch size is 128 for all algorithms.