Searching for Higgs Boson Decay Modes with Deep Learning

Authors: Peter J Sadowski, Daniel Whiteson, Pierre Baldi

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we train artificial neural networks to detect the decay of the Higgs boson to tau leptons on a dataset of 82 million simulated collision events. We demonstrate that deep neural network architectures are particularly well-suited for this task with the ability to automatically discover high-level features from the data and increase discovery significance.
Researcher Affiliation Academia Peter Sadowski Department of Computer Science University of California, Irvine Irvine, CA 92617 peter.j.sadowski@uci.edu; Pierre Baldi Department of Computer Science University of California, Irvine Irvine, CA 92617 pfbaldi@ics.uci.edu; Daniel Whiteson Department of Physics and Astronomy University of California, Irvine Irvine, CA 92617 Address daniel@uci.edu
Pseudocode No The paper describes methods in text but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code or links to a code repository.
Open Datasets No The paper states it uses 'a dataset of 82 million simulated collision events' and references 'simulated collisions from sophisticated Monte Carlo programs [4, 5, 6]', but it does not provide concrete access information (link, DOI, repository, or formal citation for a specific public dataset instance) for this simulated data.
Dataset Splits Yes A validation set of 1 million examples was randomly set aside for tuning the hyperparameters.
Hardware Specification Yes Computations were performed using machines with 16 Intel Xeon cores, an NVIDIA Tesla C2070 graphics processor, and 64 GB memory.
Software Dependencies No Training was performed using the Theano and Pylearn2 software libraries [9, 10]. However, specific version numbers for these libraries are not provided.
Experiment Setup Yes The tanh activation function was used for all hidden units, while the the logistic function was used for the output. Weights were initialized from a normal distribution with zero mean and standard deviation 0.1 in the first layer, 0.001 in the output layer, and 1/sqrt(k) for all other hidden layers, where k was the number of units in the previous layer. Gradient computations were made on mini-batches of size 100. A momentum term increased linearly over the first 25 epochs from 0.5 to 0.99, then remained constant. The learning rate decayed by a factor of 1.0000002 every batch update until it reached a minimum of 10^-6. All networks were trained for 50 epochs.