Towards training digitally-tied analog blocks via hybrid gradient computation
Authors: Timothy Nest, Maxence Ernoult
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate the effectiveness of this approach on ff-EBMs using Deep Hopfield Networks (DHNs) as energy-based blocks, and show that a standard DHN can be arbitrarily split into any uniform size while maintaining or improving performance with increases in simulation speed of up to four times. We then train ff-EBMs on Image Net32 where we establish a new state-of-the-art performance for the EP literature (46 top-1 %) |
| Researcher Affiliation | Collaboration | Timothy Nest timothy.nest@mila.quebec Maxence Ernoult maxence@rain.ai Montreal Institute of Learning Algorithms (MILA) Rain AI Equal contribution |
| Pseudocode | Yes | Algorithm 1 ff-EBM inference (Eq. (5)) |
| Open Source Code | Yes | Our code is available on https://github.com/rain-neuromorphics/hybrid_bp_ep_official |
| Open Datasets | Yes | Simulations were run on CIFAR-10, CIFAR-100 and Imagenet32 datasets, all consisting of color images of size 32 32 pixels. CIFAR-10 [Krizhevsky, 2009] includes 60,000 color images of objects and animals. CIFAR-100 [Krizhevsky, 2009] likewise comprises 60,000 and features a diverse set of objects and animals split into 100 distinct classes. The Image Net32 dataset [Chrabaszcz et al., 2017] is a downsampled version of the original Image Net dataset Russakovsky et al. [2015] containing 1,000 classes with 1,281,167 training images, 50,000 validation images, 100,000 test images and 1000 classes. |
| Dataset Splits | Yes | The Image Net32 dataset [Chrabaszcz et al., 2017] is a downsampled version of the original Image Net dataset Russakovsky et al. [2015] containing 1,000 classes with 1,281,167 training images, 50,000 validation images, 100,000 test images and 1000 classes. |
| Hardware Specification | Yes | Code was implemented in Pytorch 2.0 and all simulations were run on NVIDIA A100 SXM4 40GB GPUs. This research was enabled by the computational resources provided by the Summit supercomputer, awarded through the Frontier DD allocation and INCITE 2023 program for the project 'Scalable Foundation Models for Transferable Generalist AI' and Summit Plus allocation in 2024. These resources were supplied by the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, with support from the Office of Science of the U.S. Department of Energy. |
| Software Dependencies | Yes | Code was implemented in Pytorch 2.0 and all simulations were run on NVIDIA A100 SXM4 40GB GPUs. |
| Experiment Setup | Yes | All convolutional layers used in experiments are of kernel size 3 and stride and padding 1. Max-pooling was applied with a window of 2 2 and stride of 2. For the 6-layer model used in Table 1 , batchnorm was applied after the first layer convolution and pooling operation. All other models in both experiments use batch-normalization on the first layer of each block after convolution and pooling (where applied). We initialized the weights of U k FC and U k CONV using Gaussian Orthogonal Ensembles (GOE) [Agarwala and Schoenholz, 2022] to enable faster equilibrium computation. All layers are initialized as zero matrices. All experiments were run using Adam optimizer [Kingma and Ba, 2014]and Cosine Annealing scheduler[Loshchilov and Hutter, 2017], specifying some minimum learning rates and setting maximum T equal to epochs (i.e. no warm restarts). One noteworthy detail is that only 100 epochs were used for the larger model for Table 2 compared with 200 epochs for the smaller 12-layer model. |