Thursday, January 27, 2011

Draft: Thoughts on Workflow, BPMN, BPM, Batch jobs, Integration Patterns, Spring Batch and Spring Integration

 Raw thoughts on BPMN, Java Workflow, etc.

  1. Use Business Process Management Engine to get visibility into the process of push (Activiti) abbreviated BPME
  2. Use Enterprise Integration Patterns approach for messaging/eventing between processes (Spring Integration) abbreviated EIPM
    1. Use it to message to the BPME
    2. Use it to kick off other processes
    3. Use it to integrate via adapters XMPP, JMS, AMQP, FTP, etc.
    4. Use it to publish events that other processes want to listent to
  3. Use recoverable batch processing management to do our large batch processes (Spring Batch)
    1. Use EIPM to communicate state from batch Jobs/Steps to BPME 
    2. Use retry/recover features
    3. Use transaction batching
    4. Use re-entrant features
    5. Use auditing features 

Problem domain:

This is deliberately vague enough for two reasons 1) I want to use it to solicit feedback from an external audience 2) I don't posses a clear understanding of all of the details of the current process.
Let's say you have the following problem. You have a bunch of producers (photographers, ERP specialists, artist, lawyers, marketing folks, project managers, content providers, etc) who produce raw data into a large raw data database.
The data in this raw data database is managed by a series of admin tools, and content tools. There are 30 some tools.
This raw data database consist of a highly normalized version of the actual data. It also consist of tabular business rules and metadata.
Later this raw data is pushed to the live working system.
The push process goes something like this.
The raw data is converted into domain objects by a decision support system.
The rich domain objects may nor may not end up in one of 20 or so data marts (fully relational operations data).
Later this richer domain data is further filtered by a variety of things for a variety of reasons (vague enough).
Some of this richer data is then copied into Service level caches (think memcache).
Then there are content caches (think file systems).
Then there are edge content caches for images, PDFs, and videos (think Akamai).
To simplify, the process is something like this:
  1. Transform raw data into rich domain objects using DSS
  2. Filter rich domain objects into many different operational data marts
  3. Populate Service Caches 
  4. Populate Content Caches
  5. Populate Edge Content Caches
To keep it really simple it would be:
  1. Transform raw data into domain objects
  2. Push domain objects to various data marts
  3. Populate caches
Again realize that most of the above is somewhat contrived and in reality way oversimplified.
The current system consist of 
  1. Korn shell scripts that call into a homegrown but not fully baked workflow engine (written in Java, using a custom JSON files)
  2. Perl/Python/GroovyScripts that do batch processing and then send messages to workflow engine
  3. Expensive proprietary, poorly documented, cluster messaging solution from an evil vendor (that gets used by the workflow engine)
  4. Java services that receive tasks from the aforementioned cluster messaging solution
    1. SOAP Service? No
    2. REST Service? No
    3. Homegrown service framework with custom marshaling and management (also poorly documented)? Yes :(
  5. Unicorn tears
  6. Blood of innocent babies
I bet you can guess what is wrong already. This is a complex system with many moving parts. Each one of those steps above consist of 1 to many processes.
Chasing down a problem is well in a word: HARD!
Only a few elite software engineers can find out what the problems are and fix them. They tail log files, hit admin tools, run custom groovy scripts and collect unicorn tears.
Also when something goes wrong after it is fixed, the whole process needs to run again from the top.
There is no retry. It all works or it all fails.
One bump in the road means no push.
Thus what they really want is a system that is:
  1. Maintainable
  2. Auditable
  3. Traceable
  4. Supportable
  5. Recoverable
They really need the ability to fix a problem and then continue where the process left off not have to restart the thing from the beginning.
Also they want to track how long each step is taking and need to know where in the process they are.
There is also a future need. You remember all those tools that were used to populate the raw data. If you put the raw data in wrong, it can cause a push to fail.
The admin tools rely on undocumented, ad hoc business processes. It would be nice to document, enforce and control these business processes.
Also some of these business processes have a human interaction to worry about (think approval process and content management).

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training