Modular Modeling of Biological Systems
The modular approach involves system description as a set of interconnected subsystems. A model of a system is represented as a combination of subsystem models (modules). A direct benefit is that modules can be created, validated, and improved independently by different authors. Modules may utilize different mathematical formalisms, time and spatial scales, and levels of detail. Modularity, by its nature, facilitates reusability; moreover, it separates modules’ interfaces and inner realization. Therefore modules may be used as replaceable parts: they can be easily modified and improved independently from other modules. Another advantage is the more explicit structure of the complex models, which facilitates their understanding and further support.
The modular approach is widely used in modern engineering, and in the past decade, its importance in modeling biological systems has become evident as well. One reason is that this approach well matches the structure of biological systems on multiple levels: from cells [Hartwell et al, 1999] to organs and whole organisms [Cooling et al, 2010]. Snoep et al. [Snoep et al, 2006] describe their vision of the construction of a comprehensive model describing a complete cellular system at the reaction level (i.e. silicon cell). Such a model should be constructed on the basis of many modules developed by community and stored in one repository. An example of a manually created model by combining three other models is provided.
Extensive research has also been devoted to the creation of a comprehensive global model of the human organism (virtual physiological human). It is supported by International Union of Physiological Sciences Physiome Project [Hunter et al, 2002] and EuroPhysiome initiative [Fenner et al, 2008] coordinating the efforts of many research groups around the world. One of its major challenges is models integration and coupling [Fenner et al, 2008].
Such ambitious goals (creation of virtual cell and virtual human models) require widely accepted standards for model description, which is crucial for model exchange and reuse and the development of approaches to composing models and specialized tools for creation, simulation, and analysis of modular models. As a model becomes more and more complicated, it would also greatly benefit from convenient visual representation and editing.
Formalism-transformation. A modular model is transformed into a flat model, which can be simulated using standard methods (ODE, stochastic, etc.). This process needs a formal definition of the transformation of different formalisms into each other, which is not a trivial task and may place strict limitations on modules. The degenerate case is when all modules use the same formalisms and are even described in the same formal language (for example SBML).
Co-simulation. This approach implies that each module should be simulated separately using its own simulation engine. This process should be controlled by some meta-engine which should handle interactions between modules during the simulation.
Both approaches have advantages and disadvantages. In the first case, we have strict mathematical foundations but also strict limitations on the module formalisms. In the latter case, we have great flexibility in the inner implementation of models and no need for formalisms transformation algorithms. However, the co-simulation approach lacks mathematical foundations. Even the questions about solution existence and uniqueness are open.
The key question for both approaches is how interactions between modules are described in the frames of the modular model. Usually, there are some additional elements in the modular model which are used to integrate modules.
The way how the model addresses the modules is another important question. It may directly address the module elements (equations, chemical reactions, variables, etc) or the module elements can be accessed only through predefined ports. Module ports define the interface through which other modules may be connected with it. The first approach is more flexible and the latter approach provides more controllable and well-defined modular models.
If models of subsystems are initially designed to be modules, they may predefine their interfaces. For the modeler to create a modular model, it takes only to pick up the needed modules and establish appropriate connections between their interfaces.
Fusion is simply manual creation of a larger model using submodels elements.
Composition means that a complex model includes submodels in an implicit manner. Connections are established between their inner elements.
Aggregation differs from the composition in the way which submodel elements are available for the modular model creator. Submodels predefine which elements are accessible from other submodels. This implies that submodels are initially created to be parts of larger models.
Model flattening is the process of automatic generation of a fused model on the basis of the complex model created using a composition or aggregation approach. This procedure is a degenerate variant of the formalism-transformation approach described previously.
Most widely used standards for model description are markup languages: SBML (Systems Biology Markup Language) [Hucka et al, 2003] and CellML [Garny et al, 2008]. Both of them are evolving toward the modularity concept. Discussion about including instruments for describing modular models continued in the SBML community since 2000, and a number of proposals were made by different authors. Finally, in 2012 a specification for a composite hierarchic extension to SBML and a set of tests were released. The extension allows SBML models refer to another SBML models. This is provided by an instance object which contains a reference to a submodel. Submodels can be defined either in the same file as the submodel object or somewhere else (they can be referred by URI). “Glue” elements which bind modules are replacements which work in two directions. An element from a submodel can be replaced by an element of the modular model or vice-versa. Two elements from different modules can be connected by replacing both of them with the same element. Elements may be addressed either directly or through predefined ports, which correspond to the composite approach.
CellML was initially designed to directly support modularity. Models are composed of interconnected components. However, only version 1.1 which was finalized in 2006 [Garny et al, 2008] supports import of external models and their reuse as parts of another models. Unlike SBML, in CellML modules may be connected only by replacing variable of one module with the variable of another module. Modules should predefine ports for variables which may be of two types – in and out. Connection between two modules defines mappings between “out” variable of one module and “in” variable of another.
There are a lot of modeling and simulation tools for systems biology. For example, the most complete list of tools supporting SBML is available at www.sbml.org and includes more than 250 tools. However, not many of them support modular design. The reason may be that the SBML composite package was released only recently.
We will give a brief overview of tools for systems biology which support the modularity concept. Summary tables are presented at the end of the current paper in the conclusion section.
Probably, the first modeling tool aiming to support modular design was ProMoT [Ginkel et al, 2003]. It presents an object-oriented language able to describe modular models with DAE formalism along with a visual representation. Unfortunately, at the moment, it supports only SBML l2v1 without events, which is quite an old version of SBML.
JigCell [Vass et al, 2004] is a set of tools supporting SBML l2v1 and allowing SBML models aggregating [Randhawa et al, 2009] into modular models. Modules are edited using tables of elements. It also provides simulation and analysis tools.
iBioSim [Myers et al, 2009] allows visual editing of SBML models and modular models; in addition it supports the SBML composite hierarchic package.
TinkerCell [Chandran et al, 2009] like iBioSim supports both visual and modular concepts. It is plugin-based and provides simulation and analysis using COPASI [Hoops et al, 2006]. However, to our knowledge, it supports only export to SBML format.
M2SL [Hernandez et al, 2009] bridges the gap in supporting multiformalism modeling utilizing the co-simulation approach, but, at the moment, it lacks standards support and visual representation of models.
As regards CellML, the number of tools supporting it is significantly less than that for SBML (list available at www.cellml.org). Moreover, although the language is modular by its nature, many tools supporting it do not support modularity and only provide import into flat models (JSim, http://www.physiome.org/jsim/, VirtualCell, http://vcell.org/). Tools that support modular design (OpenCell, http://www.cellml.org/tools/opencell/, COR, http://cor.physiol.ox.ac.uk/) lacks visual representation of models. The only tools presented at the official site utilizing a visual approach are the GUICellML (see www.cellml.org), which is quite simple and allows modules editing only as CellML text and CellModelViewer [Wimalaratne et al, 2009] and does not allow editing or creation of models.
SBML and CellML are not human readable, especially when dealing with complex modular models, and needs tools for interpretation into a user friendly interface. There are several research efforts aimed at describing modular models of biological systems using human readable languages, e.g., Antimony [Myers et al, 2009] (supports SBML composite and CellML), PySB [Lopez et al, 2013] (uses Python programming language to describe models, submodels are treated as callable subroutines) and little-B [Mallavarapu et al, 2009] (based on Lisp). However, we believe that the creation and support of complicated modular models strongly requires a visual representation and editing (human readable language, though, may be a good add-on to it). A common standard for graphical notation in systems biology is SBGN [Novère et al, 2009]. However, it does not provide notation for modular models. Therefore, our aim is not only to implement a modular approach, but also to develop a graphical notation and a convenient tool supporting both common standards (SBML, CellML, SBGN) and multiple formalisms.
Modular model definition
In our approach (which is more similar to CellML then to SBML), modules are black boxes which receive and send signals. They may be combined using connections – special elements representing signal transfer from one module to another. Here a signal is a value of a specified variable of the module. Hence, the overall model is described in terms of signal transmission between the functional modules. This approach is well compatible with co-simulation: modules may be simulated independently, and connections determine the exchange of variable values between them. However, it may also be used for model transformation into a flat model. We have implemented an algorithm of such transformation for the case where all modules have certain limitations on the mathematical formalism: differential algebraic equations (DAE) with discrete events and chemical reactions which can be interpreted as ODEs or stochastic processes. It provides flexibility in modular model simulation: if all modules satisfy certain conditions, then the modular model will be simulated using a transformation approach. If some modules do not satisfy these conditions, a generalized co-simulation approach is applied. Visual representation of models in both cases can be the same, and only the inner implementation of the modules is changed.
Note that in the present paper, by DAE we will mean first order explicit differential equations supplemented with pure algebraic equations which usually model conservation laws. Equations of this type are most common in systems biology and can be described in SBML and CellML:
Essentially, a module is a black box with a defined interface through which it can be connected with other modules fig. 1. Formally, we define it as a quadruple:
– set of module variables
– set of module input variables, whose values should be transferred from the outside of the module.
– set of module output variables, whose values are calculated inside the module and may be transferred to other modules.
– set of module contact variables, whose values can be modified both inside the module and outside.
The sets of output input and contact variables define a module interface through which the module can be connected with other modules. Graphically they are denoted using port elements. It should be noted that, in general, some variables may be both input and output or neither:
A module can be represented as a functional block which calculates output on the base of input:
Suppose we have two modules:
Connection between two models is defined as a correspondence between their variables. If two model variables are interconnected they should be considered as one variable in the frames of modular model. Properties of this new variable are determined by the connection type and interconnected variables.
Directed connection means that a new variable value is calculated in one module and then used in another. In that case, one module affects another but not vice versa. Formally, it is a triple:
Where p is an arbitrary, not mandatory function. It can be used, for example, to adjust variable units.
Dynamics of a new variable will be fully defined by the dynamics of (and it will inherit its name and properties). All references to will be replaced by the expression .
Notion: or in short: .
Undirected connection means that a variable value may be changed by both modules (the modules affect each other). Such variables are called shared. Formally, undirected connection is a pair:
Dynamics of the new variable is a composition of two connected variables dynamics. Undirected connection may define properties of the new variable such as name, initial value, etc directly, or by selecting one of the interconnected variables or . If these properties are not set then one of interconnected variables will be selected randomly.
Notation: . Further in the text, a connection chain , will be denoted as and similarly for undirected connections.
Semantic control. The following restrictions should be considered to create a semantically correct model:
Multiple ingoing directed connections for one variable are restricted (source of signal is ambiguous):
Cyclic directed connections are restricted (source of signal is uncertain):
Undirected and ingoing directed connection for the same variable (conflicting signals):
We can now define a modular model:
– set of modules.
– set of directed connections between modules,
– set of undirected connections between modules.
– external environment for modules which defines conditions for modules interaction.
Environment defines two types of interface (similar to CellML interface types): private interface is used to alter encapsulated modules behavior. Public interface is needed when modular model is encapsulated as a module into another modular model forming hierarchical structure. Metaphor for private and public interfaces and using modular as a module is presented on fig. 2.
Both interfaces define three sets of variables: input, output and contact. Private interface variables may be used to establish connections with encapsulated modules. In that sense, environment may be simply considered as one of the interacting modules.
Public interface is used when modular model is encapsulated into parent modular model as a module. These variables then may be accessed by other modules. There are two ways to create public interface variable. It may be variable of external environment which value is calculated according to established rules possibly under influence of encapsulated modules. Another way is to extend interface variable of the encapsulated module to the interface of modular model by establishing a connection between interface variable of a module and interface variable (of the same type) of environment. It means that signal from encapsulated module should be propagated to the outside of the modular model (for output variable) or signal incoming to the modular model will be propagated directly to one or more of it modules (for input variables). Possible connections between environment and its modules are depicted on fig. 3.
Visual modular modeling
As the base for implementation of the modular approach, we have used the BioUML platform (www.biouml.org) written in Java. Important features of BioUML relevant to the modular model approach are:
Visual modeling support.
Supports of different mathematical formalisms: ODE/DAE with discrete events, stochastic, PDE.
SBML, CellML and SBGN support.
Tools for model analysis (steady state, sensitivity, flux balance, etc.) and parameter estimation.
Test suite facility for step by step validation of models during development.
Modular design – BioUML can be easily extended by adding software plugins.
BioUML describes a model as a graph whose nodes and edges denote model elements and interactions between them. Each diagram element may be associated with a database entry and simultaneously with an abstract mathematical entity. Diagrams may be visually created and edited by the user. Visual representation is defined by a formal graphical notation which depends on the diagram type. For example, in importing a SBML model, the user may choose to represent it using the SBGN or BioUML notation. Specific notation data along with graph layout information will be saved as an annotation element. Graph is used to generate Java code for numerical simulations. The generated code depends only on the mathematical properties of the model; therefore, the mathematically equivalent SBML (with or without SBGN) and BioUML models will produce the same code.
In order to implement the modular approach in the BioUML platform, we have developed the graphical notation presented in Table 1. The user interface for modular modeling is presented in Fig. 4. It includes graphical notation for different modules types, interface ports and connections.
The modular model definition described above permits modules integration only by connecting their variables. However, the user probably would like to tune modules before integrating them into a modular model. This process is facilitated by the state concept in BioUML (fig. 5). A state comprises a list of changes which are applied to the model. This can be almost any change that can be done to the model: element deletion, addition, editing. The model can be even fully rewritten, though this is not the best practice and usually is not expected. After adding the module into the modular model, the user may specify which state will be used as default in the frames of the modular model or create a new state for module.
Model flattening algorithm
In this section, we will describe an algorithm which transforms the modular diagram defined in previous sections into a “flat” one. The input of the algorithm is a modular diagram with submodels that can be:
SBML model (in SBGN or BioUML notation)
ODE with discrete events model created in BioUML
Model describing biological pathways created in BioUML
Modular model whose modules also satisfy conditions 1-4.
Technical modules – Switchers, Constants, Plots and buses – are also allowed. Only Averager module is restricted as it cannot be efficiently transformed into an algebraic or ODE equation. Another natural restriction is that modular model cannot contain itself as a module.
The result of the algorithm work is a diagram described in the same formalism. It can be interpreted either as a set of DAE with events simulated by standard methods (Euler, Dormand-Prince, ported to Java CVODE [Hindmarsh et al, 2005]) or alternatively as a stochastic model simulated by stochastic algorithms [Gillespie, 2007].
The algorithm can be applied in two cases:
Automatically, when the user decides to simulate modular model. The transformed model is passed to the solver, which generates the simulation result. The flat model is intermediate and is not stored in the repository or shown to the user.
If the user wants to convert a modular model into flat one and save it for further work. In this case the result is the ODE model which is stored in the BioUML repository.
The algorithm goes as follows:
Step 1. If some modules are modular models themselves, they are transformed into flat ODE modules, – the algorithm is applied recursively.
Step 2. Each variable of each module is given a new name, unique in the frames of the modular model. The set of unique identifiers will be denoted as . Created mapping: .
Step 3. Processing connections which are established in the modular model, replacement rules are generated. Each variable is associated with some mathematical expression . Set of replacements will be denoted as .
If there are no ingoing directed or undirected connection for , then we set . Thus, if there are no connections in the model, then is equal to and each variable will be replaced by its new name only. Moreover, if variable names in different modules do not coincide, then all names will be preserved and no replacements will be done at all.
. Let us construct a chain (possibly cyclic) of undirected connection:
In this chain we choose one variable which will be called main, we will denote it as . Then for each element of chain we set:
Choice of main variable:
If connection chain includes bus then bus variable is taken as main. Only one bus may be included into connection chain.
Otherwise if connections define their own variables, one of them will be chosen. Modeler may define which connection should have priority. If priorities are not set then choice between connection variables is random.
At last, if connections do not define new variables then variable is chosen between connected variables.
. We set . Let us show that this definition is correct: if has no other connections or has undirected connection, then – is already defined. If has ingoing directed connection , then applying the same rule to :
This process is guaranteed to converge because there is a limited number of variables in the model and the semantic rules restrict the cyclic and multiple directed connections. Finally we obtain the mapping:
Here and has no ingoing directed connections.
Thus, we may define a mapping which associates each variable from with an expression:
Step 4. The procedure is applied to the modular model. Each element is copied to the plain model. If the element is a compartment, then this procedure is applied to all its inner elements recursively. This process takes into account the replacement rules generated on the previous step. Namely, all references to variable are replaced by references to expression .
Additionally, a number of special rules are applied during this process:
Equations. Assignments (including the initial assignments and event assignments) and differential equations which define the variables such that are ignored. Those variables are either replaced via directed connection and therefore their dynamic is defined by another variable, or interconnected via undirected connections and were not considered as main.
Differential equations. If several variables are defined by differential equations and there is an undirected connection between them,
then a new differential equation is generated:
Thereby, the chain of connected variables is transformed into one variable, whose dynamic is the sum of the dynamics of the chain variables. It should be noted that this agrees with stitching reactions for interconnected species (see fig. 8).
Compartments. If two compartments are connected by an undirected connection, they merge into one compartment containing elements from both of the initial compartments (see fig. 6). If two compartments are connected by a directed connection, them one of them is replaced by the other, all its elements are removed from the model (see fig. 7).
Parameters and species. From a mathematical point of view, species and parameters are almost equal. They both are associated with model variables whose dynamics may be defined by assignments, events, differential and algebraic equations and both may be subject of module interface ports so we may establish connections between the parameter and species. However, they are differently represented in the diagram; therefore we should process such connections in a special way. Suppose, we have parameter p in one module and species S in another. We have three possible connections between them:
. In this case, species S will be selected as the main species and therefore will substitute p in all equations and assignments.
. Species S will not be entirely substituted by the parameter. Instead, a new equation will be generated to ensure signal transmission:
. Just as in the simple case with two connected parameters: the parameter p will be substituted by S and all equations which affect its dynamics will be eliminated (so that its dynamics will be entirely defined by the dynamics of species S).
Species graphical elements. If two species are connected by an undirected connection, they should now be considered as one entity. A simple way is to create new species which will replace both original species whenever they are referenced in the model. In visual representation, all reactions in which the original species participate will now be applied to the new species. However, if models contain a large number of reactions, they are hard to trace. Another option is to apply a clone attribute (see [Novère et al, 2009]) indicating that two objects in the Diagram are associated with the same entity and hence with the same variable. Examples are depicted on fig. 8.
Let us consider a situation where we have two species connected by a directed connection: . The directed connection supposes that B will not be changed by its module. So all equations possibly changing its amount should be eliminated. However, we still want it to participate in the reactions. We cannot simply replace it by A as in the case of undirected connection, because then A will be affected by these reactions. Therefore, we add both species A and B into plain model add new assignment equation “B = A” to provide directed connection effect. Besides, we specify boundary conditions for species B to avoid its amount changing by reactions.
Bus. A bus is special element in the modular diagram representing variables. It serves as private interface ports for modular model environment. Type is defined by established connections. If undirected connection is established with bus then no directed connections may be established and vice versa. It also cannot have multiple incoming directed connections. Several buses may be associated with the same, helping to avoid intersections between connection edges. Bus connected by undirected connections also allow user to set variable properties that will be used for new variable corresponding to the chain of undirected connections. Example of buses using is presented on fig. 9.
Results and discussion
This paper describes a modular approach implemented as a software plugin for BioUML. It is similar to CellML in the sense that it allows modules integrating using connections between variables. However we present two different types of connections with different properties and utilize state concept which allow more extensive altering of modules in the frames of modular model and makes possible to express semantic similar to SBML composite models. The approach also provides a graphical notation which represents a model in terms of signal transmission between modules. This representation is used for both model transformation and general co-simulation approaches. It separates the model visual creation process from the actual inner implementation of modules and type of simulation.
The developed plugin includes:
Graphical notation for the visual creation of the modular model.
Algorithm for generating a flat model on the basis of modular model in the case of certain limitations on the modules formalisms (DAE with events).
Algorithm for an agent-based approach to co-simulation in the case of arbitrary modules formalisms.
The plugin was already used for the development of several modular models:
Integrated apoptosis model [Kutumova et al, 2012] comprising 13 submodels which were derived from different sources.
Reconstruction in BioUML of the classic overall circulation model by professor Guyton [Guyton et al, 1972] comprising 18 modules according to the original model scheme.
Model of the human cardiovascular system [Kiselev et al, 2012] incorporating modules from three deferent models utilizing different formalisms: a model with heart pulsating [Solodyannikov, 1994] (ODE), a model with long-term human regulation (ODE) [Karaaslan et al, 2005], and a model of blood flow across 55 largest human arteries (PDE) [Biberdorf et al, 2012].
Software is in active development. Currently, we continue our research in the following directions:
Full support of SBML composite package. We believe that SBML composite models may be effectively described in our terms of connections and states.
Support for CellML 1.1.
Extending the the library of atomic modules.
Visual models creation and editing improvements: more extensive support of multilevel hierarchical models visual representation. Automatic layout algorithms improving.
The developed plugin (as well as the source code) is available both in web and standalone BioUML versions.
- There are currently no refbacks.