The previous section introduced the plain Graph Oriented Programming model in its simplest form. This section will discuss various aspects of graph based languages and how Graph Oriented Programming can be used or extended to meet these requirements.
Process variables maintain the contextual data of a process execution. In an insurance claim process, the 'claimed amount', 'approved amount' and 'isPaid' could be good examples of process variables. In many ways, they are similar to the member fields of a class.
Graph Oriented Programming can be easily extended with support for process variables by associating a set of key-value pairs that are associated with an execution. Concurrent execution paths and process composition will complicate things a bit. Scoping rules will define the visibility of process variables in case of concurrent paths of execution or subprocesses.
'Workflow Data Patterns' is an extensive research report on the types of scoping that can be applied to process variables in the context of subprocessing and concurrent executions.
Suppose that you're developing a 'sale' process with a graph based process language for workflow. After the client submitted the order, there is a sequence of activities for billing the client and there's also a sequence of activities for shipping the items to the client. As you can imagine, the billing activies and shipping activities can be done in parallel.
In that case, one execution will not be sufficient to keep track of the whole process state. Let's go though the steps to extend the Graph Oriented Programming model and add support for concurrent executions.
First, let's rename the execution to an execution path. Then we can introduce a new concept called a process execution. A process execution represents one complete execution of a process and it contains many execution paths.
The execution paths can be ordered hierarchically. Meaning that one root execution path is created when a new process execution is instantiated. When the root execution path is forked into multiple concurrent execution paths, the root is the parent and the newly created execution paths are all children of the root. This way, implementation of a join can become straightforward: the implementation of the join just has to verify if all sibling-execution-paths are already positioned in the join node. If that is the case, the parent execution path can resume execution leaving the join node.
While the hierarchical execution paths and the join implementation based on sibling execution paths covers a large part of the use cases, other concurrency behaviour might be desirable in specific circumstances. For example when multiple merges relate to one split. In such a situation, other combinations of runtime data and merge implementations are required.
Multiple concurrent paths of execution are often mixed up with multithreaded programming. Especially in the context of workflow and BPM, these are quite different. A process specifies a state machine. Consider for a moment a state machine as being always in a stable state and state transitions are instantanious. Then you can interpret concurrent paths of execution by looking at the events that cause the state transitions. Concurrent execution then means that the events that can be handled are unrelated between the concurrent paths of execution. Now let's assume that state transitions in the process execution relates to a database transition (as explained in the section called “Persistence and Transactions”), then you see that multithreaded programming is actually not even required to support concurrent paths of execution.
Process composition is the ability to include a sub process as part of a super process. This advanced feature makes it possible to add abstraction to process modelling. For the business analyst, this feature is important to handle break down large models in smaller blocks.
The main idea is that the super process has a node in the graph that represents a complete execution of the sub process. When an execution enters the sub-process-node in the super process, several things are to be considered:
After the sub process entered a wait state, the super process execution will be pointing to the sub-process-node and the sub process execution will be pointing to some wait state.
When the sub process execution finishes, the super process execution can continue. The following aspects need to be considered at that time:
WS-BPEL has an implicit notion of subprocessing, rather then an explicit. An
invoke
will start of a new sub process. Then the super process will
have a receive
activity that will wait till the sub process ends.
So the usual invoke
and receive
are used instead of
a special activity.
Above, we saw that the default behaviour is to execute processes synchronously until there is a wait state. And typically this overall state-change is packaged in one transaction. In this section, you'll see how you can demarcate transaction boundaries in the process language. Asynchronous continuations means that a process can continue asynchronously. This means that the first transaction will send a message. That message represents a continuation command. Then the message receiver executes the command in a second transaction. Then the process has continued its automatic execution, but it was split over 2 transactions.
To add asynchronous continuations to graph oriented programming, a messaging system is required. Such a system that integrates with your programming logic and allows for transactional sending and receiving of messages. Messaging systems are also know as message oriented middleware (MOM) and Java Message Service (JMS) is the standard API to use such systems.
There are 3 places where execution can be continued asynchronously:
Let's consider the first situation in detail as it is indicated in the following figure. Suppose some event caused an execution to start propagating over the graph and now a transition is about to invoke the execute method on the 'generatePdf' node. Instead of invoking the execute method on the 'generatePdf' node directly, a new command message is being created with a pointer to the execution. The command message should be interpreted as "continue this execution by executing the node". This message is sent over the message queue to the command executor. The command executor take the message from the queue and invokes the node's execute method with the execution as a parameter.
Note that there are two separate transactions involved now. One transaction that originated from the original event. That transaction contains moving the execution in the 'generatePdf' node and sending the command message. In a second transaction, the command message was consumed and the node's execute method was invoked with the execution as a parameter. Inbetween the two transactions, the execution should be blocked for incoming events.
Both process definition information (like Node, Transition and Action) and execution information (like Execution) can be stored in a relational database. An ORM solution (like eg Hibernate/EJB3) can be used to perform the mapping between the database records and the OOP objects.
All process definition information is static. Hence it can be cached in memory. This gives a serious performance boost. Only the runtime execution data will have to be loaded from the DB in each transaction.
A transaction typically corrensponds to the event method on the Execution. A transaction starts when an event is being processed. The event method will trigger execution to continue till a new wait state is reached. When that happens, the Execution's event method returns and the transaction can be ended.
The overall change of the event method invocation is that the Execution has moved it's node pointer from one node to another. The ORM solution can calculate the difference between the original database state and the updated java objects. Those changes are then flushed to the database at the end of the Execution's event method. In our example here this will be a SQL update statement on the execution, that sets the node pointer to the new (wait-state)node.
ORM solutions like hibernate/EJB3 work with a different set of objects in each session. This implies that all access to Node implementations is serialized and removes the necessity to write thread safe code as long as the node uses the execution data (and not static variables, for instance).
Nodes might want to make use of pluggable services or new node implementations might want to use new services, unknown at design time. To accomodate this, a services framework can be added to Graph Oriented Programming so that nodes can access arbitrary services and configurations.
Basically, there are 2 options:
The execution context contains access to services that are made available
by 'the environment'. The environment is the client code (the code that invokes
the Execution.event(String)
plus an optional container in which
this client code runs.
Examples of services are a timer service, an asynchonous messaging service, a database service (java.sql.Connection),...