Draft: Thoughts on Workflow, BPMN, BPM, Batch jobs, Integration Patterns, Spring Batch and Spring Integration
Raw thoughts on BPMN, Java Workflow, etc.
Use Business Process Management Engine to get visibility into the process of push (Activiti) abbreviated BPME
Use Enterprise Integration Patterns approach for messaging/eventing between processes (Spring Integration) abbreviated EIPM
Use it to message to the BPME
Use it to kick off other processes
Use it to integrate via adapters XMPP, JMS, AMQP, FTP, etc.
Use it to publish events that other processes want to listent to
Use recoverable batch processing management to do our large batch processes (Spring Batch)
Use EIPM to communicate state from batch Jobs/Steps to BPME
Use retry/recover features
Use transaction batching
Use re-entrant features
Use auditing features
This is deliberately vague enough for two reasons 1) I want to use it to solicit feedback from an external audience 2) I don't posses a clear understanding of all of the details of the current process.
Let's say you have the following problem. You have a bunch of producers (photographers, ERP specialists, artist, lawyers, marketing folks, project managers, content providers, etc) who produce raw data into a large raw data database.
The data in this raw data database is managed by a series of admin tools, and content tools. There are 30 some tools.
This raw data database consist of a highly normalized version of the actual data. It also consist of tabular business rules and metadata.
Later this raw data is pushed to the live working system.
The push process goes something like this.
The raw data is converted into domain objects by a decision support system.
The rich domain objects may nor may not end up in one of 20 or so data marts (fully relational operations data).
Later this richer domain data is further filtered by a variety of things for a variety of reasons (vague enough).
Some of this richer data is then copied into Service level caches (think memcache).
Then there are content caches (think file systems).
Then there are edge content caches for images, PDFs, and videos (think Akamai).
To simplify, the process is something like this:
Transform raw data into rich domain objects using DSS
Filter rich domain objects into many different operational data marts
Populate Service Caches
Populate Content Caches
Populate Edge Content Caches
To keep it really simple it would be:
Transform raw data into domain objects
Push domain objects to various data marts
Again realize that most of the above is somewhat contrived and in reality way oversimplified.
The current system consist of
Korn shell scripts that call into a homegrown but not fully baked workflow engine (written in Java, using a custom JSON files)
Perl/Python/GroovyScripts that do batch processing and then send messages to workflow engine
Expensive proprietary, poorly documented, cluster messaging solution from an evil vendor (that gets used by the workflow engine)
Java services that receive tasks from the aforementioned cluster messaging solution
SOAP Service? No
REST Service? No
Homegrown service framework with custom marshaling and management (also poorly documented)? Yes :(
Blood of innocent babies
I bet you can guess what is wrong already. This is a complex system with many moving parts. Each one of those steps above consist of 1 to many processes.
Chasing down a problem is well in a word: HARD!
Only a few elite software engineers can find out what the problems are and fix them. They tail log files, hit admin tools, run custom groovy scripts and collect unicorn tears.
Also when something goes wrong after it is fixed, the whole process needs to run again from the top.
There is no retry. It all works or it all fails.
One bump in the road means no push.
Thus what they really want is a system that is:
They really need the ability to fix a problem and then continue where the process left off not have to restart the thing from the beginning.
Also they want to track how long each step is taking and need to know where in the process they are.
There is also a future need. You remember all those tools that were used to populate the raw data. If you put the raw data in wrong, it can cause a push to fail.
The admin tools rely on undocumented, ad hoc business processes. It would be nice to document, enforce and control these business processes.
Also some of these business processes have a human interaction to worry about (think approval process and content management).