Friday, December 19, 2008

Error Handling in Oracle's ESB

This is a re-post of an earlier article I wrote. Any links to the old version will now be busted... Sorry.

In our project, we follow a strict governance process for governing our services. We identify the services and their operations from our Business Process Model, then proceed to producing a WSDL and associated XSDs to represent that service. Only once this is done will we proceed to implementation. This is called top down design and is generally a good thing. By doing this, we end up with a clean design that represents the ideal, pure business requirements, rather than being technology driven as is often the case with bottom up designs. Nothing new here. Sooner or later, the rubber hits the road, and we end up implementing the service using some sort of technology. In a recent case, Oracle’s ESB product was selected as the implementation technology for some entity services that we are developing. It allowed us to provide a SOAP interface to our entities, whilst still preserving transactions and speedy performance by tying into WSIF and Oracle’s optimised message delivery capabilities. So far, everything was looking rosy. Then we got to implementing the fault handling. Now, I think most people would agree that having a variety of different faults for an operation is a good idea. That way, the caller can distinguish between the different types of fault that can happen. In particular, this service had these faults:
  1. PersistenceFault, if there was a general technical fault with the service (e.g. the database was down)
  2. IllegalUpdateFault, if the caller has attempted to modify a field that they shouldn't be.
Now we find a limitation of the ESB product. It turns out that the current version (10.1.3.3) doesn't support operations which can return multiple faults. All products have their limitations, so all we need here is a workaround, right? What are the possible solutions. Here's what we came up with:
  1. Create some sort of XSL mapping the ESB to try and fudge multiple faults: This is impossible, as normally faults would have different messages, and the ESB will only route to one destination message.
  2. Use a common fault type, and distinguish between the different faults using a fault code, either numeric or enumeration based. This will work, but there is no way of including structured data in the fault, as it will need to be generic
  3. Use a common fault type which is a <choice> of the different faults, and get the caller to work out which one it was.
  4. Use a common fault type, and then extend it via XSD methods, and use polymorphism to tell the difference.
Options 3 and 4 are discussed further below:

Option 3.

<xsd:complexType name="SimpleFaultType">

<xsd:sequence>

<xsd:element name="faultstring" type="xsd:string"/>

<xsd:element name="detail" type="xsd:string"/>

</xsd:sequence>

</xsd:complexType>

<xsd:element name="CombinedFault">

<xsd:complexType>

<xsd:choice>

<xsd:element name="PersistenceFailureFault" type="SimpleFaultType"/>

<xsd:element name="IllegalUpdateFault" type="PersistenceFailureFaultType"/>

</xsd:choice>

</xsd:complexType>

</xsd:element>

Option 4.

<xsd:complexType name="BaseFaultType">

<xsd:sequence>

<xsd:element name="faultstring" type="xsd:string"/>

<xsd:element name="detail" type="xsd:string"/>

</xsd:sequence>

</xsd:complexType>

<xsd:complexType name="PersistenceFailureFaultType">

<xsd:complexContent>

<xsd:extension base="BaseFaultType">

<!-- Place additional fields in here -->

</xsd:extension>

</xsd:complexContent>

</xsd:complexType>

<xsd:complexType name="IllegalUpdateFault">

<xsd:complexContent>

<xsd:extension base="BaseFaultType">

<!-- Place additional fields in here -->

</xsd:extension>

</xsd:complexContent>

</xsd:complexType>



Using this approach, the fault message/element in the WSDL will be of type BaseFaultType. The caller can then interrogate the xsi:type attribute to work out exactly which fault has been thrown.

Our entity pattern calls for using the ESB to wrap EJB functionality, and will need to route exceptions back from the EJB into the Core Data Model version of the fault. This will need to preserve any polymorphic faults, which represents additional work both for the java developer and the ESB designer who must write a non-standard XSLT file to map the exceptions. This approach seems quite neat, but it does sacrifice type safety. Unlike Option 3, a consumer looking at a WSDL will not know exactly which faults a web service can throw, as it in theory could throw any of the extensions of the base fault type. Instead, he will need to look at additional documentation (we call this the CSP or Consumer Service Profile) to work out which faults can be thrown by the service. Developer tools such as JDeveloper will also not be able to use wizards to interrogate parts of the fault message, for exactly the same reason. Instead, the developer will need to examine the xsi:type attribute, then copy the XML element into a variable that represents the right fault (this is as close as BPEL gets to casting). If we were to take the xsd:choice approach, then the WSDL would represent all of the fault information in the XSD, and JDeveloper would be able to pick up the types and work with them. In addition to all of this, a special BaseFaultType will need to be added to the Canonical Data Model (CDM). This would not be necessary using Option 3, as the CombinedFault fault used would be specific to each service and be placed in its local namespace.

The Wash Up

Either option 3 or option 4 will work. They are effectively the same solution, but use different mechanisms to shoe horn multiple exceptions into the one message. Due to the enhanced type safety, my position is that Option 3 is the way to go. Either solution is a pain in the butt, for a few reasons. The consumer is going to have to perform logic when a fault occurs to work out which error occurred. This will unnecessarily clutter up our BPEL or Java processes, especially if the different faults have different scopes. For example, we might want to deal with an IllegalUpdateFault within a tight context, but bubble PersistenceFault out to a wider context. The fault handling code in the consumer will need to deal with that. We also sacrifice readability of the interface itself, as it becomes difficult to see which Faults are being thrown where. But the real annoying thing is that now our technology is dictating terms of the interface. We can not participate fully in top down design because our tool doesn't support all of the ways that we may want to shape our WSDL. For something as fundamental as multiple fault handling to be left out is unforgivable in my opinion.
blog comments powered by Disqus