Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient

Authors: Vu C. Dinh, Lam S. Ho, Cuong V. Nguyen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments with both synthetic and real datasets, we validate our theoretical analyses and highlight the inefficiency of Re LU neural networks compared to analytical networks.
Researcher Affiliation Academia Vu C. Dinh Department of Mathematical Sciences University of Delaware vucdinh@udel.edu Lam Si Tung Ho Department of Mathematics and Statistics Dalhousie University lam.ho@dal.ca Cuong V. Nguyen Department of Mathematical Sciences Durham University viet.c.nguyen@durham.ac.uk
Pseudocode No The paper describes the leapfrog updates in Equations (1)-(3) but does not present them in a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We would like to keep the code confidential.
Open Datasets Yes In addition to the synthetic dataset above, we also conduct experiments to validate our theoretical findings on a subset of the real-world UTKFace dataset (Zhang et al., 2017).
Dataset Splits No The subset is then randomly split into a training set (167 images) and a test set (100 images). The paper does not explicitly mention a validation dataset split.
Hardware Specification No The experiments were run on a single CPU machine. The paper does not provide specific details such as CPU model, memory, or other hardware specifications.
Software Dependencies No Our simulations are implemented using the Autograd package (Maclaurin et al., 2015). The paper mentions the software 'Autograd' but does not specify its version number.
Experiment Setup Yes We choose a standard normal prior π(q) = N(0, I) and sample 2,000 parameter vectors from the posterior after a burn-in period of 100 samples. We vary the number of steps L {200, 400, 600, 800, 1000} together with the step size ϵ {0.0005, 0.0010, 0.0015, 0.0020, 0.0025}. We fix the travel time T = ϵL = 0.1 and vary ϵ {0.0005, 0.0010, 0.0015, . . . , 0.0040}. For each network, we run HMC with L = 200 and ϵ = 0.001 while keeping other hyper-parameters the same as in previous simulations. In this experiment, we fix T = 0.01 and vary ϵ {0.00005, 0.00010, 0.00015, . . . , 0.00040}. For each run of the HMC, we sample 300 parameter vectors from the posterior after a burn-in period of 50 samples.