Systems and Integrative Biology Training Program October 2023 Workshops

Applications of Network Science in Systems and Integrative Biology Research

Joshua Pickard, trainee

Indika Rajapakse, mentor

Other Participants: Victoria Sturgess, Katherine McDonald, Nick Rossiter, Costas Lyssiotis, Daniel Beard, Yatrik Shah, Kelly Arnold

Overview: The members of the Systems and Integrative Biology (SIB) Training Program met twice during the month of October 2023 to explore how networks structures and dynamics appear in a broad array of research applications, both in terms of networked data and network based tools of analysis. Trainees and mentors explored metabolic and gene regulatory networks and discussed network based methods of system identification.

1. Dynamic Mode Decomposition for Learning Network Dynamics

Trainees learned the concept of Dynamic Mode Decomposition (DMD), which is a data-driven approach to learn the network structure and dynamics from time series data. Given a set of measurements taken on a biological system over time, such as RNAseq throughout the cell-cycle or cell differentiation, DMD forms two matrices directly from the data and learns a linear operator that approximates the network structure and rate constants which generated the time series data. The linear operator learned through DMD is often thought of as the adjacency matrix of the corresponding network that describes the system which generated our data.

Figure 1: Overview of the DMD process and output. In this figure, the data represents turbulent fluid flow, a highly nonlinear process, but the future state prediction of DMD performs well despite the nonlinearity.

For large data sets, DMD efficiently utilizes the Singular Value Decomposition (SVD) to aid in denoising the data and efficiently learning the network structure. We used this opportunity as an introduction to network dynamics and as a refresher in classic, linear algebraic methods of data science. We discussed the equivalence between the SVD and Principal Component analysis. Several varieties of DMD have been developed, each designed for different applications. We discussed the biological data modalities and experiments that can be modeled with this process, and we discussed several limitations of DMD as well. In particular, the linear models learned by DMD likely do not capture the full complexity of nonlinear biological systems; however, DMD has been successfully utilized in many other engineering, social, and industrial applications to describe nonlinear systems, as in the case of fluid dynamics in Figure 1.

2. Exploring Flux Analysis with Virtual Metabolic Human and RECON3

A trainee—Victoria Sturgess—presented on their ongoing work to understand and model metabolic processes as a network flow problem. Utilizing the RECON3 database, which contains a large portion of the available genome-wide known metabolic interactions in humans, stoichiometric matrices were constructed to represent the network incidence and reaction structure of metabolic processes (see Figure 2). The goal of this construction is to identify the steady state reaction dynamics of each metabolic process. The null space of these matrices corresponds to the network steady state where the reaction processes but the concentrations of various species modeled in the network remain constant. We discussed several toy examples of this model and then explored how this model has been applied to the Citrate Cycle (hsa00020) and Glycolysis (hsa00010) via the incorporation of RNAseq data to the RECON3 network models. We discussed the current state of this research project as well as future directions for how the RECON3 network can be utilized.

Figure 2: (Top) An example of a network and the corresponding Stoichiometric matrix. Each vertex A, B, …, E indicates a metabolite and the reaction pathways are shown as the black numbered arrows. The stoichiometric matrix S indicates how reactants (rows) are converted between one another via the columns (reactions). (Bottom) The full RECON3 map of different metabolic pathways is shown, and the top 200 upregulated reactions identified according to the discussed method are highlighted.

3. Coessentiality Networks Identify Novel Functional Relationships in Human Cells

A trainee—Nick Rossiter—presented their ongoing work to understand how coessentiality networks may be utilized to identify new genomic relationships. Coessentiality networks are identified via CRISPR screens to determine the relationships between uncharacterized genes; see Figure 3. We discussed the role that CRISPR plays in forming these networks in addition to different methods of denoising network data such as Cholskey whitening, and Nick presented some results on his current work investigating these networks for mitochondrial function. This was an interesting conversation that left the group with several questions we are looking into for our next meeting, such as where else and how we can apply the mechanics of Cholskey whitening.

Figure 3: Genome-wide coessentiality networks are identified via CRISPR screens to form gene by gene networks.

In addition to our discussion of employing network based methods in our research, we met with a team from Instructor to learn about new technologies and frameworks that we may utilize in the future to streamline and organize our meetings and progress.

List of Tools and Resources:

1. RECON3 Database (https://vmh.life/): “The VMH database captures information on human and gut microbial metabolism and links this information to hundreds of diseases and nutritional data. At its core, there are hundreds of manually curated genome-scale metabolic reconstructions, which have been assembled based on genomic, biochemical, and physiological data. The VMH facilitates rapid analyses and interpretations of complex data arising from large-scale biomedical studies by enabling complex queries of its content, by providing a detailed graphical representation of human metabolism, and by distributing computational models for simulating human and microbial metabolism.”

2. Dynamic Mode Decomposition: Data Driven Modeling of Complex Systems (https://epubs.siam.org/doi/book/10.1137/1.9781611974508)

“Data-driven dynamical systems is a burgeoning field—it connects how measurements of nonlinear dynamical systems and/or complex systems can be used with well-established methods in dynamical systems theory. This is a critically important new direction because the governing equations of many problems under consideration by practitioners in various scientific fields are not typically known. Thus, using data alone to help derive, in an optimal sense, the best dynamical system representation of a given application allows for important new insights. The recently developed dynamic mode decomposition (DMD) is an innovative tool for integrating data with dynamical systems theory. The DMD has deep connections with traditional dynamical systems theory and many recent innovations in compressed sensing and machine learning.”