Manual recovery of BPEL process does not look like a trivial job at all.
And the BPEL Console for manual recovery does do seem to reveal a lot of information as to which process corresponds to which instance of BPEL process in the first glance.
Why do instances end up in manual recovery?
Well this is to do with how BPEL engine handles in coming messages.
BPEL delivery service does 2 things
- Use JMS to register message to be processed
- Save message in dehydration store
And when the instance is complete the dehydration stores goes to HANDLED state.
If the server shuts down or crashes or engine times out and comes back unable to find the JMS message as it might be already consumed.
What does Manual Recovery show?
Manual recovery shows all instances that are in UNRESOLVED state.
So it would also show instance going from UNRESOLVED to HANDLED states in dehydration store. But are not yet marked as HANDLED.
From what I have seen in production box, a lot of instances come into manual recovery and disappear in a short while.
Should I recover everything that I see in Manual Recovery?
One should not recover every message that you see in manual recovery.
Some of these could be genuine messages that are in flight.
Recover only those messages that are in manual recovery for at least a duration of X which is larger than expected in that particular enterprise system.
What and where do you recover?
Well when you look at the manual recovery area there are 3 different tabs that you generally see in the BPEL console. Most often you would only recover from the first 2 that are represented invoke and callbacks.
Generally I use the instance ID of the BPEL process to identify manual recovery instance corresponds to which BPEL instance.
If there are instances in staled state in BPEL, there there would be a correspoonding manual recovery process instance.
This manualt recovery instance conversation ID will have also contain the instance ID of the BPEL process along with other information with which the conversation Id is build.
What if there is no instance ID but some thing like MD5{xyz...}
These are messages for which instances are not yet created. Can be safely recovered.