Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neural Entropy
Authors: Akhil Premkumar
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Measurements of neural entropy on a few simple image diffusion models reveal that they are extremely efficient at compressing large ensembles of structured data. 6 Experiments Neural entropy, as defined in Eq. (11), quantifies the information presented to the network in an idealized setting. In practice, the finiteness of the data, imperfections in training, and strong inductive biases of the network all affect the amount of information stored in the neural network. To address these points we will perform two broad classes of experiments, first to probe the transport properties of diffusion discussed in Eq. (22), and second, to study the storage efficiency of diffusion models. Transport experiments We work with synthetic datasets sampled from simple multivariate distributions for which we have closed-form expressions for both pd and log p (e.g. Gaussian mixtures). This allows us to produce as many samples as we require with high fidelity, compute their exact log densities, and work in arbitrary dimensions. Storage experiments We carry out similar experiments on a simple image diffusion model with a U-net core, trained on the MNIST dataset without class conditioning [36]. In this instance, the training dataset is small relative to the dimensionality of pixel space. |
| Researcher Affiliation | Academia | Akhil Premkumar Department of Applied Physics Yale University New Haven, CT 06511, USA EMAIL Work done while at the University of Chicago |
| Pseudocode | No | The paper includes mathematical equations and descriptions of processes but does not feature any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/akhilprem1/Neural Entropy |
| Open Datasets | Yes | We carry out similar experiments on a simple image diffusion model with a U-net core, trained on the MNIST dataset without class conditioning [36]. ... We obtain the same behavior from a diffusion model trained on the CIFAR-10 dataset [37]. |
| Dataset Splits | No | The paper mentions using N samples for training (e.g., "model trained on N = 8192 samples from a mixture of five Gaussians", "train the image model on the first nc samples from each class, N = 10nc samples in total"). However, it does not specify explicit training/validation/test splits, percentages, or methodology for partitioning the data for reproducibility beyond just specifying the training set size. |
| Hardware Specification | Yes | All computations were done on A100 GPUs with 80 GB of memory. The CIFAR-10 models were trained on 4 GPUs in parallel while the Gaussian mixture and MLP experiments were trained on just one. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | All models in this paper, except for the ones in Fig. 9, were trained with 4 random seeds varying both weights initialization and order of training data, and the results were averaged over. ... We used Îș = Ï0 = 0.1, so both processes transform pd to p0 N(0, 10 21D) at T = 1 (cf. Eq. (40)). ... In the experiments shown in Figs. 2 and 10 each yd is evolved to 10 random values of s from this interval, which improved KL estimates whilst also reducing loss fluctuations [77, 78]. But in the image models we sampled at just one random s per yd per epoch. ... For the low-D models in the transport experiments we used Fourier features on the x variable to help the the MLPs learn better [74]. These were inserted before the input stage of an MLP with architecture (512, 256, D). We use T = 1 in all experiments. ... All CIFAR-10 experiments were trained to 200 epochs. |