Towards training digitally-tied analog blocks via hybrid gradient computation

Authors: Timothy Nest, Maxence Ernoult

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate the effectiveness of this approach on ff-EBMs using Deep Hopfield Networks (DHNs) as energy-based blocks, and show that a standard DHN can be arbitrarily split into any uniform size while maintaining or improving performance with increases in simulation speed of up to four times. We then train ff-EBMs on Image Net32 where we establish a new state-of-the-art performance for the EP literature (46 top-1 %)
Researcher Affiliation Collaboration Timothy Nest timothy.nest@mila.quebec Maxence Ernoult maxence@rain.ai Montreal Institute of Learning Algorithms (MILA) Rain AI Equal contribution
Pseudocode Yes Algorithm 1 ff-EBM inference (Eq. (5))
Open Source Code Yes Our code is available on https://github.com/rain-neuromorphics/hybrid_bp_ep_official
Open Datasets Yes Simulations were run on CIFAR-10, CIFAR-100 and Imagenet32 datasets, all consisting of color images of size 32 32 pixels. CIFAR-10 [Krizhevsky, 2009] includes 60,000 color images of objects and animals. CIFAR-100 [Krizhevsky, 2009] likewise comprises 60,000 and features a diverse set of objects and animals split into 100 distinct classes. The Image Net32 dataset [Chrabaszcz et al., 2017] is a downsampled version of the original Image Net dataset Russakovsky et al. [2015] containing 1,000 classes with 1,281,167 training images, 50,000 validation images, 100,000 test images and 1000 classes.
Dataset Splits Yes The Image Net32 dataset [Chrabaszcz et al., 2017] is a downsampled version of the original Image Net dataset Russakovsky et al. [2015] containing 1,000 classes with 1,281,167 training images, 50,000 validation images, 100,000 test images and 1000 classes.
Hardware Specification Yes Code was implemented in Pytorch 2.0 and all simulations were run on NVIDIA A100 SXM4 40GB GPUs. This research was enabled by the computational resources provided by the Summit supercomputer, awarded through the Frontier DD allocation and INCITE 2023 program for the project 'Scalable Foundation Models for Transferable Generalist AI' and Summit Plus allocation in 2024. These resources were supplied by the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, with support from the Office of Science of the U.S. Department of Energy.
Software Dependencies Yes Code was implemented in Pytorch 2.0 and all simulations were run on NVIDIA A100 SXM4 40GB GPUs.
Experiment Setup Yes All convolutional layers used in experiments are of kernel size 3 and stride and padding 1. Max-pooling was applied with a window of 2 2 and stride of 2. For the 6-layer model used in Table 1 , batchnorm was applied after the first layer convolution and pooling operation. All other models in both experiments use batch-normalization on the first layer of each block after convolution and pooling (where applied). We initialized the weights of U k FC and U k CONV using Gaussian Orthogonal Ensembles (GOE) [Agarwala and Schoenholz, 2022] to enable faster equilibrium computation. All layers are initialized as zero matrices. All experiments were run using Adam optimizer [Kingma and Ba, 2014]and Cosine Annealing scheduler[Loshchilov and Hutter, 2017], specifying some minimum learning rates and setting maximum T equal to epochs (i.e. no warm restarts). One noteworthy detail is that only 100 epochs were used for the larger model for Table 2 compared with 200 epochs for the smaller 12-layer model.