# Demystefying Complexity in Dynamical Systems using Machine learning and Data Science

SIAM Mathematical and Data Science 2020: Using fluid and plasma physics and nonlinear dynamics as primary examples, we explore the use of machine learning and modern tools of data science for a deeper understanding of self-organization, complex multiscale dynamics and reduced models in the physical sciences. From Hamiltonian dynamics to dissipative models and examples from inertial confnement fusion applications, we showcase how advanced data science techniques and statistical inference can help guide crucial discovery in these fields of research. From Runaway electrons in Tokamaks to Fluid and MHD Turbulence in Z pinches, to laser-plasma instabilities and chaotic transport, we show myriad examples of machine learning enhancement of a deeper insight into the workings of classical many body systems with long range forces. New numerical methods inspired by machine learning advances will also be discussed.

*Bedros Afeyan, Polymath Research Inc., U.S. (link to pdf)*

This work accelerates and improves the functionality of kinetic plasma simulations by using machine learning and variational optimization to select an accurate tiling of phase space with a parsimonious use of particles. The version that will be shown in detail is called BARS: Bidirectional Adaptive Refinement Scheme, and its simplified version which serves as a proof of principle: mini-BARS. Examples will be given from nonlinear kinetic electron plasma waves and KEEN waves to show of the order of 100x reduction in the number of particles required to achieve good accuracy when compared to traditional PIC methods. How tp partition phase space to seggregated regions with different resolution requirements will be given. What is learned in one simulation can be easily applied to nearby simulations accelerating the ensemble.

*Michael E. Glinsky, Sandia National Laboratories, U.S. (link to pdf)*

The Mallat Scattering Transformation (MST) is a hierarchical, multiscale, transformation that has been shown to be a form of deep learning related to convolutional neural networks. This talk will explore its meaning, its relationship to causal physics, and its significance in the analysis of complexity. We have developed theory that connects the transformation to the causal dynamics of physical systems. This has been done from the classical kinetic perspective (using a coordinate free exterior calculus formalism) and from the field theory perspective, where the MST is the generalized Green's function, or S-matrix of the field theory in the scale basis. From both perspectives the first order MST is the current state of the system, and the second order MST are the transition rates from one state to another. If one includes the evolution coordinate, that is time, in the transformation, the second order MST directly, and with no further transformation, gives the transition kernel of the dynamics. This is independent of the current state, that is the first order MST. Given an ensemble of example states that sufficiently sample the transition kernel, one has fully characterized the physical system and should be able to evolve any state forward in time, as given by the initial first order MST. That is the MST is the perfect coordinate system in which to learn, identify, and propagate the dynamics.

*Varchas Gopalaswamy, Riccardo Betti, James Knauer, Aarne Lees, and Dhrumir Patel, University of Rochester, U.S. (link to pdf)*

Using fluid and plasma physics and nonlinear dynamics as primary examples, we explore the use of machine learning and modern tools of data science for a deeper understanding of self-organization, complex multiscale dynamics and reduced models in the physical sciences. From Hamiltonian dynamics to dissipative models and examples from inertial confnement fusion applications, we showcase how advanced data science techniques and statistical inference can help guide crucial discovery in these fields of research. From Runaway electrons in Tokamaks to Fluid and MHD Turbulence in Z pinches, to laser-plasma instabilities and chaotic transport, we show myriad examples of machine learning enhancement of a deeper insight into the workings of classical many body systems with long range forces. New numerical methods inspired by machine learning advances will also be discussed.

**Koopman Approximations for Multiscale Nonlinear Physics using Dynamic Mode Decomposition**

*Daniel Dylewsky, University of Washington, U.S.(link to pdf)*

Many physical systems of practical interest exhibit simultaneous dynamics on highly disparate time scales. Deconstructing such systems according to this property can offer valuable insight: behavior on two very different time scales can often be modeled separately, with only a limited coupling between them. In this work I will present a data-driven approach to decomposing time series data with multiscale properties which could serve as a valuable precursor to a variety of scale-separated analysis tasks. This method makes use of Dynamic Mode Decomposition (DMD), which identifies spatial and temporal coherencies to approximate sample data with a linear superposition of complex exponentials. DMD is applied on a sliding window over the input data, and the resultant frequency spectra are then clustered to identify dominant time scale components. Scale-separated reconstructions of the original signal are produced simply by summing over each cluster of modes separately. Results are presented for a simple polynomial toy model and for a system of three-body planetary motion. To conclude I will briefly discuss how this method might be used in model discovery and forecasting for multiscale systems.

**Galaxy Image Deconvolution : Sparsity vs Machine Learning**

*Fadi Nammour, Florent Sureau, and Jean-Luc Starck, CEA Saclay, France (link to pdf)*

In the upcoming decade, the Square Kilometre Array telescope will deliver several petabytes per second of data. Raw galaxy images in this data will need to be processed with both high speed and accuracy. The accuracy includes the galaxy shape information that is important in astrophysics. However, raw images are corrupted by distortions and noise. They can be corrected using restoration algorithms [S. Farrens et al., Space Variant Deconvolution of Galaxy Survey Images, 2017]. Yet standard restoration algorithms have no guarantee of conserving shape information. Therefore, astrophysicists estimate galaxy shapes directly on raw data which is not optimal. In our previous works, we developed a shape constraint and showed that adding it to a restoration algorithm can decrease the shape estimation error by at least 20% and increases its robustness [F. Nammour et al., Galaxy Images Restoration with Shape Constraint, in prep.]. We also developed a deep learning algorithm that performs fast and precise shape estimations [F. Sureau et al., Deep Learning for Space Variant Deconvolution in Galaxy Surveys, in prep.]. In this work, we combine the shape constraint with deep learning to offer a new approach that simultaneously restore galaxy images and shape information. To do so, we develop the Machine learning Algorithm for Deconvolution and Shape Information Retrieval (MADSIR) and we compare its performance with a sparse restoration algorithm on simulated data.

**Revealing the Deeper Interplay between Crucial Parameters in ICF using Data Science Analytics**

*John L. Kline, M. J. Grosskopf, J. P. Lestone, G. Srinivasin, B. M. Haines, S. M. Finnegan, and J. A. Pruett, Los Alamos National Laboratory, U.S.; B. Afeyan, Polymath Research Inc., U.S.; O. Landen, Lawrence Livermore National Laboratory, U.S.(link to pdf)*

The complexity of inertial confinement fusion (ICF) experiments at the National Ignition Facility (NIF) using full cryogenic Deuterium-Tritium fuel leads to a low data return rate, especially with other facility demands. However, over the past ten years a sufficient amount of data from over 140 such implosions have been collected to which we apply statistical data science. The analyses facilitate the discovery of significant correlations across multiple target designs, each aimed at a different level of performance. The correlated parameters inform machine learning optimization techniques decreasing the time to find the best target design candidates and laser conditions. We expect future experiments with higher repetition rates to train deep neural networks directly maintaining optimal fusion performance with artificial intelligence controlling experimental parameter in real time. This is a first step in that direction.

*S. M. Finnegan, J. L. Kline, Los Alamos National Laboratory, Los Alamos, NM 87544, and B. Afeyan, Polymath Research Inc., Pleasanton, CA 94566 (link to pdf)*

This presentation discusses the susceptibility to cognitive bias of design efforts in multi-physics, multi-scale complex dynamics, especially when constrained by sparsely sampled data-sets. This includes the potential efficacy of machine-learning when applied to such systems. We show instead that statistics and modern data-science best practices can aid in protecting the credibility of such scientific endeavors and strengthen their conclusions, and shield them from expectation bias, among other ills. The pursuit of laboratory inertial fusion at the National Ignition Facility, at Lawrence Livermore National Laboratory, with MJ class laser systems, is a prime candidate to illustrate how cognitive bias may influence experimental data acquisition and processing and the origin of baked in experimental data-base anomalies. Furthermore, we discuss the inherent vulnerability and additionally amplified uncertainties that accrue when sparsely-sampled experimental data sets, such as those in inertial fusion research, are expected to be augmented with integrated numerical simulations that rely on rather primitive reduced order models, that introduce their own irreversible cognitive biases into the uncertainty pool that is deemed to be the “data-base.”

** Characterizing Dynamic Mathematical Structure in Data**

*Cory Brown, William Redman, Connor Levenson, Bingjie Huang (UCSB), Zac Fernandez (MSU) (link to pdf)*

Classical techniques in analyzing network structure (such as spectral clustering) primarily address weighted time-invariant graphs. Like the Fourier transform, these methods have been used too often, resulting in analyses that have overly constrained assumptions – e.g., Fourier decomposition of signals in transient settings. In the field of neuroscience, this has resulted in copious literature on static resting state networks – time-invariant subnetworks, of a global brain network, that describe large scale brain connectivity. For complex systems like the mind, adding the mathematical complexity of time-dependent edges provides additional information for diagnosis. Motivated by both neuroscience and chemistry, we consider means to analyze time-varying structure in data. We do this through the lens of Koopman operator theory. The main contribution here is in building observables by composing network-valued observables with properties of networks. For example, normalized volume of time-varying activation of resting state networks. Random-variable-valued and manifold-valued observables can be utilized in a similar fashion. By considering such exotic observables, observable spaces can be constructed such that Koopman spectral quantities will characterize time-varying mathematical structure of data.

**Machine Learning for Directed Energy Device Design Getting the Human out of the Loop**

*John Luginsland, Confluent Sciences, LLC, A Spirkin , PH Stoltz Tech X Corp (link to pdf)*

**Learning to Represent Heterogeneous Scientific Data for HEDP**

*Rushil Anirudh CASC/COMP/LLNL (link to pdf)*

As simulation campaigns have become successful in advancing our understanding of high energy physics, the problem of comparing simulation and experimental data becomes critical in optimizing diagnostics and experimental design. Due to the challenges in modeling high dimensional, multi-modal datasets, the comparisons have often resorted to a handful of scalar diagnostics across simulation and experiment, without making use of all the other rich modalities, and information that may be available. In this talk we explore the use of self-supervised learning strategies to learn effective representations of such datasets in order to ultimately enable stronger coupling between simulation and experiment. We see how representations can be learned to be robust, capture multi-modal relationships, be physically meaningful, and interpretable based on the design of an appropriate self-supervised encoding objective.