Improving Particle Classification in WIMP Dark Matter Detection Experiments Using Neural Networks


In all experiments for detection of Weakly Interacting Massive Particle (WIMP) dark matter, it is essential to develop a function that can distinguish events caused by WIMP candidates from those caused by background radiation. Manually developing such a classifier is challenging, time-consuming, and necessitates detailed physical modeling.

Machine learning has the potential to automate this task and accelerate experimentation, in addition to detecting patterns that humans cannot. However, impure calibration data adversely affects training of models, and unusual detector topologies make data challenging to process.

I have developed novel machine learning algorithms that perform significantly better at this task than previous methods, in the PICO-60 and DEAP-3600 experiments. These results should allow accelerated iteration for teams working on these experiments, while improving accuracy and retaining reliability. Additionally, they promise to generalize to future WIMP experiments.

I approached the 10% calibration data impurity present in the PICO-60 bubble chamber experiment by developing semi-supervised learning algorithms that synthesize new labels for training data, improving accuracy from 80.7% to 99.2%. Additionally, I investigated previously unexplored input data formats and neural network architectures.

DEAP-3600 is a spherical detector with light sensors on the surface. I present new algorithms that can process this unusual topology: a cylindrical projection system, and a new type of CNN that processes arbitrary geometric data. This reduced the rate of false positives from 91.0% to 75.7%.

I have lead-authored an academic paper on my PICO-60 research, which the PICO collaboration has reviewed and approved. It will shortly undergo peer review.

Question / Proposal

Can I develop novel machine learning techniques that are more accurate than existing methods at classifying background radiation in the PICO-60 and DEAP-3600 experiments for WIMP dark matter detection?

One of the most significant fields of research in physics today is dark matter research. A conclusive answer either way would have the potential to revolutionize our understanding of the universe at a fundamental level.

However, developing a detector for dark matter, in the form of Weakly Interacting Massive Particles (WIMPs), is difficult. Even if one develops a highly sensitive apparatus such as a bubble chamber or photon detector, there is still background radiation, such as alpha particles, that have properties similar to expected dark matter particles.

Developing a conventional classifier to separate dark matter from background radiation is possible. However, it usually relies on detailed physical modeling of the detector, and manual optimization, both of which are time-consuming and must be reworked whenever the experiment changes.

Machine learning is a potential solution to this; calibration data can be used to train a model to separate particle types. However, calibration data is frequently impure, which often leads to overfitting and poor accuracy. Additionally, in many experiments, unusual detector formats make it challenging to find an appropriate machine learning model. I hope to resolve these challenges.

Based on my research, I expect to be able to develop a machine learning model that classifies more accurately than current methods, by applying original semi-supervised learning techniques and new systems based on convolutional neural networks.


There have been many high-profile, well-documented dark matter experiments in the past several years; PICO-60 and DEAP-3600 are two experiments for which I was able to get access to data, by reaching out to research physicists.

The PICO-60 experiment (Amole et al.) is a bubble chamber containing superheated C3F8. When a particle strikes an atom in the liquid, a disturbance is created and a bubble forms. The primary form of background radiation is alpha particles, emitted by nuclear decays inside the detector. A conventional classifier for alpha particles, known as the Acoustic Parameter (AP), was developed by the PICO-60 collaboration and verified based on physical modeling and empirical modification to distinguish accurately between alpha particles and nuclear recoils, which means it can be used to verify machine learning algorithms.

Because of the benefits of automation, preliminary experimentation has previously been done by Amole et al. with machine learning for discrimination of WIMP-like particles from alpha particles, yielding 80.7% accuracy. However, 10% of calibration data consists of impurities, likely causing overfitting and thus reducing accuracy. The input to AP, as well as the neural network, is an 8-band Fourier transform of audio captured by piezoelectric microphones in the detector.

The DEAP-3600 detector (Amaudruz et al.) contains 3.3 tonnes of liquid argon, which emits photons when struck by a particle. The photons are detected by 255 extremely sensitive light detectors (photomultipliers or PMTs) placed around the acrylic vessel containing the liquid argon. Based on the counts and timings of photons that reach each of the PMTs, it is possible to determine the energy and location of any event that occurs in the body of the detector. However, alpha events that occur in the neck are very difficult to isolate, because they overlap with the apparent characteristics of expected WIMP candidates.

It is impractical to create significant amounts of clean calibration data in the DEAP-3600 detector. Thus, data from a Monte Carlo simulation (which was benchmarked using real-world calibration data) is used instead.

A conventional classifier was previously developed by DEAP-3600 physicists; it was able to remove 99.6% of neck events, at the cost of 91.0% of hypothetical (simulated) WIMP events. Machine learning has not been applied in the past. Based on its applications in PICO-60, I am hopeful I can make a similar difference for DEAP-3600.

I expect I can make a difference to these two experiments, in part, because of the precedent set by previous work using machine learning in experimental physics; it is instrumental to efficiently classifying the Higgs boson in the Large Hadron Collider, an effort summarized well by Guest et al.

I believe this work is beneficial to society because it helps enable experiments that explore the fundamental nature of the universe. Not only does it help humanity understand who we are and what we are made of, but previously obscure physics research has proven instrumental to life-saving inventions many times in the past: for instance, according to Minervini et al., superconductors are essential to magnetic resonance imaging.

Method / Testing and Redesign


For PICO-60, I developed and compared two sets of classification algorithms:

  1. I experimented further with conventional supervised learning (where a model is trained on a set of inputs with expected outputs). I explored the following data formats to learn which was the best-performing solution:
    • An 8-band Fourier transform. I applied a dense neural network with dropout and L2 regularization (which allowed for less overfitting to impure data).
    • A higher-resolution Fourier transform of the audio (all 50,001 data points). I once again applied a dense neural network.
    • A raw audio waveform with a very deep 1D convolutional neural network (CNN) inspired by Dai et al. I hoped to learn whether data preprocessing was required for good performance.
    • Images captured by cameras in the detector. I trained a 2D CNN on these, to learn whether they contain any relevant information.
  2. The impurity in the training data was alleviated with regularization but not entirely solved. For a more optimal solution, I developed two entirely new semi-supervised learning algorithms. The particle labels (WIMP-like nuclear recoils, or alpha events) were removed from a large portion of the data, and my algorithms learned more accurate labels for this data.
    • Iterative Cluster Nucleation is inspired by unsupervised "clustering" algorithms. First, a neural network is trained on a labeled set. After some time, it runs predictions on the unlabeled set. Those unlabeled examples with the most confident predictions (close to 0 or 1) are added to their corresponding cluster in the labeled set; these synthesized labels are more accurate than the original labels.
    • Gravitational Differentiation is a more analog redesign. Using an original piecewise exponential function for calculating final-layer derivatives, called GravDiff, unlabeled examples are caused to "gravitate" toward more accurate predictions based on the confidence of the neural network's predictions.



In DEAP-3600, the key challenge is the aforementioned unusual detector format: a sphere tiled with a hexagonal lattice of PMTs. While it is fundamentally an image, conventional CNNs are intended only for flat rectangular images. I attempted to solve the problem in three different ways:

  1. I tried simply inputting photon counts from each PMT into a multi-layer perceptron, which has no awareness of spatial arrangement.
  2. The spherical image challenge is much like that cartographers face when making a map of the spherical Earth. I developed and applied a Mercator-inspired cylindrical projection to the spherical data, and used a 2D CNN.
  3. I developed an entirely new type of CNN, called a topological CNN. Rather than convolving over a 2D image of square pixels, kernels convolve over an arbitrary topology in any number of dimensions (in this case, a hexagonal lattice on the surface of a sphere).



I ran grid searches (exhaustive searches of network hyperparameters) to optimize each algorithm. These took up to 10 days of compute time.



I implemented all of my algorithms in Python, using Keras, TensorFlow, and NumPy. I used Matplotlib for data visualization.

My code is public at

I worked independently with minimal support at SNOLAB and my home.


Statistical Practices

All performance statistics cited refer to performance on a randomly selected validation set composed of examples not used for training. Each set of network hyperparameters tested in a grid search was trained and tested multiple times (see below), with a differently randomized split between training and validation data. Performance statistics are averaged over the multiple training runs.

  • For PICO-60, 128 of 624 examples were randomly selected for validation, and each model was trained and tested (with a different validation set) 3 times.
  • For DEAP-3600, 500 of 3304 examples were selected, and models were tested 6 times.



  • Supervised learning well exceeded performance of previous research, with the best configuration reaching a mean of 97.0% accuracy (up from 80.7%).
  • The 8-band Fourier transform was the most effective input format. The high-resolution Fourier transform (91.4%), the raw audio waveform (87.5%), and the image input (63.0%) all produced lower accuracy. There is strong evidence that the image input in particular provides no useful information.
    The plots below shows the predictions of each neural network configuration, compared to those of Acoustic Parameter (the conventional classifier).
  • Semi-supervised learning produced higher accuracy than supervised learning, with no additional training examples.
    • Iterative Cluster Nucleation reached a mean of 98.2% accuracy, using only 128 labeled training examples.
    • Gravitational Differentiation reached a mean of 99.2% accuracy (an error rate well over an order of magnitude lower than the 80.7% accuracy achieved in previous research).
  • Note that these statistics indicate error rates well below the proportion of impurities in the training set. This means that semi-supervised learning is not reinforcing the biases in the data; the learned particle labels are in fact more accurate than the original ones.
  • In addition, network predictions (particularly for Gravitational Differentiation) demonstrated much higher confidence than previous methods (including the Acoustic Parameter). This information is embodied in the class-wise standard deviation statistic (the mean of the standard deviations for each predicted particle class, after normalization). This reaches a mean of 0.11 for Gravitational Differentiation, down from 0.26 with supervised learning and 0.43 for the Acoustic Parameter.
    These properties are demonstrated in the prediction plots below. Note the much higher accuracy and confidence compared to supervised learning.

The below spreadsheet quantifies the performance of every configuration tested.

Summary of the PICO analysis:



The three architectures I tested for neck alpha identification in DEAP-3600 were evaluated based on their ability to reduce the rate of false positives (potential WIMP candidates misidentified as neck alphas) compared to previous results from conventional methods. Only models with the same (or lower) 0.4% false negative rate were considered.

  • The dense neural network always produced 100% false positives (all WIMPs rejected as alphas).
  • The topological CNN produced a false positive rate of 92.6% (higher than the previous 91.0%).
  • The cylindrical projection with 2D CNN improved the false positive rate from 91.0% to 75.7% (a 63.0% reduction)!

Below are the predictions of the cylindrical projection CNN.

The below table quantifies every configuration tested.

Summary of the DEAP analysis:


Summary of Results

Novel machine learning algorithms were developed for particle classification in the PICO-60 and DEAP-3600 dark matter experiments, and found to exceed the performance of previous research.

In PICO-60, a new semi-supervised learning algorithm called gravitational differentiation was found to improve classification accuracy from the 80.7% reached in previous machine learning studies, to 99.2%. Other supervised and semi-supervised learning algorithms were also explored.

In DEAP-3600, application of a cylindrical projection with a 2D CNN to process the detector's spherical topology was found to reduce the proportion of false positives from the 91.0% previously reached with a conventional classifier, to 75.7%, while keeping the rate of false negatives at 0.4%.


Confirmation of Hypothesis and Applications

These results answer my hypothesis in the affirmative. Indeed, semi-supervised learning for PICO-60 performed better than both the previous best result with machine learning, and my own supervised learning work. This significantly improves the accuracy that can be obtained quickly and without any manual optimization, allowing the team working on the experiment to iterate more quickly without the overhead of developing a conventional classifier.

The reduction in the false positive rate for DEAP-3600 provides evidence that machine learning is capable of improving the efficiency of this experiment. This may reduce the operation time required to collect sufficient data pointing towards the existence or non-existence of WIMP dark matter.

In general, the classifiers developed during this study demonstrate great promise for machine learning in dark matter detection. Fundamentally, the problems I have solved with respect to PICO-60 and DEAP-3600 are not specific to these two experiments; they are common. Thus, my algorithms should be applicable in the broader field of dark matter detection. I hope to explore more experiments in the future!


Limitations and Future Work


My semi-supervised and supervised classifiers for PICO-60 are immediately applicable, because they were optimized using real-world data collected from calibration sources and background radiation. In future iterations of PICO, application should be straightforward once calibration data has been collected.

One current limitation relates to a so-called "position correction", in which the amplitude of audio data is normalized based on the position of the bubble. It is not currently possible to apply this correction to any audio format other than the 8-band Fourier transform. This is a possible reason for the weaker performance of models trained on the high-resolution Fourier transform and the raw waveform. Future work for PICO-60 should thus focus on generalizing this correction to other audio formats, to confirm or refute this conjecture.


At the moment, data from a simulated DEAP-3600 detector is used for validation of all machine learning and conventional particle classifiers. A Monte Carlo simulation is always an approximation of real-world behavior, and not an absolute proof. I have not yet used real-world calibration data because it is currently too limited in quantity (approximately 30 usable events). Long-term future work should thus focus on evaluating how well machine learning classifiers generalize to real-world calibration data.

About me

I am a 15-year-old high school student in Sudbury, Ontario. Ever since I grew up reading about astronomy and physics, I have always dreamed of contributing to the quest to understand the nature of the universe we live in!

My interests in machine learning have been sparked by two of my greatest inspirations: Andrej Karpathy and Geoffrey Hinton. I greatly respect not only their tremendous contributions to the field, but also their dedication to ethical practices and improving the lives of others.

I have been working on machine learning projects since 2016. Before exploring dark matter research, I developed a real-world autonomous vehicle based on an electric go-kart. With this project, I was incredibly honored to win Best Project at the Canada-Wide Science Fair, and most recently First Prize at the European Union Contest for Young Scientists 2018.

After high school, I hope to go to university for software engineering. My long-term aspiration is to work in the artificial intelligence/machine learning industry of Silicon Valley. I am inspired by the cutting-edge AI research, and I would love to be a part of that someday.

Winning a prize at Google Science Fair would be an incredible inspiration to work hard and dig deeper into the fascinating field of dark matter research! I would absolutely love the opportunity to meet a diverse group of researchers in physics and artificial intelligence; it would be tremendously valuable in allowing me to explore places where I can make a difference in other experiments and laboratories.

Health & Safety

During the summer, I worked at the SNOLAB neutrino/dark matter observatory in Sudbury, Ontario. I did not enter the underground laboratory; I stayed in the surface building.

As this is a software project, at no point did I put myself or anyone else at any risk. All contact with radioactive sources was done by professionals years prior to this project.

My supervisor was Dr. Nigel Smith:

Bibliography, references, and acknowledgements


During the summer of 2018, I worked at SNOLAB, a dark matter and neutrino observatory in Sudbury, Ontario. Thanks very much to Dr. Nigel Smith for generously providing me this opportunity! Also, thanks to Ken Clark, Carsten Krauss, Scott Fallows, and Pierre Gorel for introducing me to the PICO-60 and DEAP-3600 experiments.

While I was at SNOLAB, I received minor guidance from the above research physicists on the following subjects:

  • Acquiring and loading real-world and simulated training data
  • Understanding the physical principles of the PICO-60 and DEAP-3600 detectors

I did not receive any guidance on the following subjects, completing them entirely independently:

  • Conception and development of machine learning algorithms (new supervised and semi-supervised learning architectures, the topological CNN, etc.)
  • Conception and development of data preprocessing methods (the full-resolution Fourier transform, the map projection system, etc.)
  • Development of supporting data processing software
  • Visualization and analysis of resulting diagnostic data
  • Iteration and improvement based on failures

Of course, thanks to my family and friends for supporting me throughout this challenging, at times frustrating, but certainly worthwhile endeavor.



While at SNOLAB, I did not interact with any physics equipment. I was granted access to the Graham compute cluster located at the University of Waterloo. This helped to make my computationally expensive grid searches practical to run within a reasonable period of time, especially for complex architectures such as the topological CNN.



J. Kiefer and J. Wolfowitz. “Stochastic Estimation of the Maximum of a Regression Function”. In: Ann. Math. Statist. 23.3 (Sept. 1952), pp. 462–466. doi: 10.1214/aoms/1177729392. url:

J. M. Hollander, I. Perlman, and G. T. Seaborg. “Table of Isotopes”. In: Rev. Mod. Phys. 25 (2 1953), pp. 469–651. doi: 10.1103/RevModPhys.25.469. url: 10.1103/RevModPhys.25.469.

Rene Brun and Fons Rademakers. “ROOT - An Object Oriented Data Analysis Framework”. In: AIHENP’96 Workshop, Lausanne. Vol. 389. 1996, pp. 81–86.

Gerard Jungman, Marc Kamionkowski, and Kim Griest. “Supersymmetric dark matter”. In: Phys. Rept. 267 (1996), pp. 195–373. doi: 10 . 1016 / 0370 - 1573(95 ) 00058 - 5. arXiv: hep - ph/9506380 [hep-ph].

Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python. 2001–. url:

Travis Oliphant. A guide to NumPy. 2006.

J. D. Hunter. “Matplotlib: A 2D graphics environment”. In: Computing In Science & Engineering 9.3 (2007), pp. 90–95. doi: 10.1109/MCSE.2007.55.

F. Pedregosa et al. “Scikit-learn: Machine Learning in Python”. In: Journal of Machine Learning Research 12 (2011), pp. 2825–2830.

Diederik P. Kingma and Jimmy Ba. “Adam: A Method for Stochastic Optimization”. In: CoRR abs/1412.6980 (2014). arXiv: 1412.6980. url:

Stefan van der Walt et al. “scikit-image: image processing in Python”. In: PeerJ 2 (June 2014), e453. issn: 2167-8359. doi: 10.7717/peerj.453. url:

Martin Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from 2015. url:

Francois Chollet et al. Keras. 2015.

Wei Dai et al. “Very Deep Convolutional Neural Networks for Raw Waveforms”. In: CoRR abs/1610.00087 (2016). arXiv: 1610.00087. url:

C. Amole. PhD thesis. Queen’s University, 2017.

C. Amole et al. “Dark Matter Search Results from the PICO−60 C3F8 Bubble Chamber”. In: Phys. Rev. Lett. 118 (25 2017), p. 251301. doi: 10 . 1103 / PhysRevLett . 118 . 251301. url:

J. Minervini et al. "Recent advances in superconducting magnets for MRI and hadron radiotherapy: an introduction to 'Focus on superconducting magnets for hadron therapy and MRI'." In: Superconductor Science and Technology. url:

D. Guest et al. "Deep Learning and its Application to LHC Physics." arXiv: 1806.11484. url:

P.-A. Amaudruz et al. "First results from the DEAP-3600 dark matter search with argon at SNOLAB." In: Phys. Rev. Lett. 121, 071801 (2018). doi: 10.1103/PhysRevLett.121.071801. arXiv: 1707.08042. url: