Norbert Hartl

mostly brackets and pipes

Isn't SOAP Supposed to Make It Clean?

Many years I managed to ignore upcoming standards like SOAP. I took only a short peek on the specification after it became available. My experience with complex technologies and my personal suffering while realizing corba projects kept me from getting into SOAP. The gut feeling told me that it might be just another bloated RPC-style technology.

Just a few month ago I agreed to participate in a project where SOAP was a constraint (and java, of course). The main part of the project was about integration of two existing platforms. On the one hand the existing services that are working with HTTP (I never felt I need something else).

On the other hand the remote service that provided a WSDL just waiting to be integrated. Searching for APIs I found quite a lot of ready to use stuff. There are spring webservices and serveral apache APIs in different versions. As usual it is hard to know which one is good to get started. But a simple service is always created quickly and the client for it does not take that much longer. Even the basic part of the API was just easy to implement. Most of the calls (request and response) consist only of a lot of native parameters. With native I mean native data types but dealing closely with SOAP it is strings anyway.

But then there were some other calls that turned everything into mess. The basic idea behind those calls is that you send a request and you get back a 60MB text in CSV format. This is the story of how to write a server component that requests such a blob in order to distribute it to other servers.

But let’s step back a little. Not using HTTP but using SOAP we experience some restrictions already:

  • SOAP uses HTTP Post. You loose the other HTTP verbs completely
  • SOAPs format is XML and the content-type is fixed for the soap+xml content-type
  • XML is a character only format. So you cannot use binary data without converting it

Ok, if you like to transport the CSV blob inside the XML you need to convert to e.g. Base64 to be able to have it in the document. But if you convert it on every request you have to spend a lot of cpu cycles to do it. That is less desirable. I think this was the event of SOAP attachments that can optimize the procedure. There are two (that I know of) approaches to get it done. In our case it was MTOM. The basic idea is that you detach the binary data from the XML document and transfer it separately for performance reasons. The mechanism to deal with the XML part is called XOP. You replace the binary data in the document with an xop:include statement that contains a reference to the data. The binary data is transferred separately with an identifier that the xop:include points to. All of this should be transparent to the client. It only needs to know about the binary data and can retrieve it via an special accessor of the XML node that “contains” the data.

Another short step back. For binary data delivery in SOAP you need to:

  • either convert the binary data to a textual format (e.g. Base64) which is not desirable performance-wise
  • or you need two other standards (MTOM, XOP) to have it delivered

While we are using HTTP for the transport it gets really funny if we take a closer look. The outcome of applying the other two standards is that the binary data is mapped together with the XML in a multipart mime response. Sounds weird? Well…Last but not least let us turn back towards the client. Dealing with SOAP means dealing with XML. And dealing with XML means in this case dealing with DOM. If we take a WSDL and turn it into a java SOAP client we will face that the whole thing described above is created as DOM in the memory and DOM is surely not a memory saver.

A last step back for the client:

  • big responses were delivered in chunked transfer encoding. In the best case this would mean there is still the idea about streaming in all of this.
  • In the worst case it would mean they don’t have a glue how big the whole response will be when they start to write the response using a simple WSDL generated java client means the response from the SOAP server is unmarshalled into objects.

And I’m sure it is in all cases build from DOM. I can tell that trying to do some tests I had to increase the java maximum heap for the client (which actually was just another server) to an amount where you don’t even start to think about scalability. Well, or you think only about certain server manufacturers.

What the CRUD???

What is the advantage of using SOAP over pure HTTP? I don’t know! Taking a second look at the API that I used didn’t show any reasons why this is a good idea. Most of the API calls carried between zero and 5 parameters. Something that works well with pure HTTP. Only one call has an object as parameter. It might sound good to be able to put just another object as a parameter to another call. In the SOAP communication the object parameter is just nested inside the call parameters. I didn’t find anything that deals with identity of objects. I mean If you send an object twice there is no way of knowing if it is the same object. If you can’t determine identity the nested object is just a complex way of delivering parameters. I can’t see much gain.

So most of the applications out there are dealing with a CRUD approach and this fits perfectly on HTTP. You can use the HTTP verbs to specify your action, you get status codes that enable you detect things easily, you can specify content-types to let the remote side know what it has to digest, you can send multipart message to send character and binary data in the same request,….Furthermore there are a lot of components that understand HTTP. They can help you in orchestrating, load balancing, rewriting, distributing your HTTP actions,… To me HTTP is flexible while SOAP is complex.

My word of advize is to be very careful when you are about to select a technology. It might sound as a good idea to choose the one “that can handle the complex scenarios as well” but most of the time it turns back on you. All of these technologies are complex, have a long and steep learning curve. And if something does not work you are mostly out of business solving it. A rule of thumb for me is the following:

Estimate the complexity of your project. Then estimate the complexity of every technology you plan to use. If a single technology outweighs the complexity of your own project by some extent than you need to think again.

Is there no soap for SOAP?

Sort of I would say. I did realize the project by getting rid of SOAP ASAP. I wrote a HTTP-to-SOAP proxy that pushed the existence of SOAP to the border of our own system. As I wrote above most of the SOAP calls aren’t complex and are easily generated by a little application. On the response side there is Axis2 which has a DOM implementation that can stream. So no matter how big the response is you can just stream and process it and distribute it to another service that has memory for a single task. If I’m not mistaken this approach is called “document oriented SOAP” and it is gaining popularity. Just think of processing XML. Often there is no need to unmarshall the XML to java objects. You can just strip off the SOAP envelope and distribute it as XML. You have still to face XML then but… :)

Still I have to say I’m no expert in SOAP. This article just describes my observations while realizing a SOAP project and where it conflicted with my sense of useful technology application. If you have comments, critics and the like you can find my email on this side.