First, in four sentences or less, give me a quick introduction to SOAP.
Simple Object Access Protocol (SOAP) is a set of XML messages that are sent from one computer to another (usually using HTTP) in order to call a software routine. With SOAP, one program can use software functions in another program written in another computer language running on different hardware. SOAP is one of the technologies used to implement a service-oriented architecture (SOA).
What are the two kinds of SOAP requests?
There are two different kinds of SOAP requests. If the SOAP client explicitly names the routine that it wants to execute, it’s called “rpc-style” SOAP. If the SOAP client hands over an entire object to the SOAP server (like an “order” or “loan” object) and lets the server decide which routine should be called, then it’s called “document-style” or “EDI” SOAP. Since industry strongly favors RPC-style SOAP, in most cases a SOAP call can be thought of as a remote procedure call.
So in a nutshell - what's SOAP?
To call a routine on another computer, you have to specify a number of things. What’s the name of the routine? What parameters are you passing to it? What should happen if there is an error? SOAP is an XML document that just conforms to a standard structure to answer these questions.
So why is SOAP becoming so popular?
Unlike other RPC technologies, SOAP messages pass right through firewalls. A SOAP message is composed entirely of XML, which is text-based and firewalls trust text information far more than anything in binary format (they shouldn’t but they do). Moreover, SOAP is almost always used in conjunction with HTTP, which is also usually allowed to pass through firewalls unchecked since it’s usually used to serve out simple web pages. So one of the key features of SOAP is that it allows you to subvert pesky corporate security policies embedded in firewalls because SOAP messages are really just XML messages sent over HTTP and firewalls don't inspect their contents.
Sidebar - SOAP Firewalls
A new market for smarter firewalls is emerging. These firewalls check SOAP messages for the same kinds of attacks that people launch at operating systems or other programs. IBM’s DataPower device is essentially a SOAP-aware firewall. But don't expect it to cost the same as that firewall you got from the grocery store.
As SOAP is used to integrate existing legacy applications, those existing routines do not have to be modified to understand SOAP in any way. It is the server, equipped with a “SOAP engine” that listens for incoming messages, converts messages into the internal native format expected by the application server routine (that is, it “de-serializes” the messages), invokes the routine with the native parameters, then packs up the response (“serializes” the response) and sends it back to the original caller. SOAP does not alter the legacy systems. Rather, the SOAP server transparently sits on top of them. It is really the SOAP engine that makes existing infrastructure available to software running on other hardware and operating systems. A SOAP engine of some sort must reside on any message participant.
So what's the basic SOAP structure?
In all SOAP literature, the SOAP message structure is usually represented in some common, ordinary diagram that looks something like this:
Plain old, run-of-the-mill SOAP Message Diagram. Ho hum.
OK – perhaps most SOAP books don’t present diagrams exactly like this one, but I bet they wish they did. Note that a SOAP message has three primary parts - the envelope, the header, and an incredible body section.
Like many protocols, an "envelope" encapsulates the entire message. The content of the message resides in the message body while an optional header block is used to include information that might be needed by the receiver but isn’t strictly part of the message content (like login and password information, state, or transaction context). The SOAP standard calls for the entire message to be in XML. A SOAP message is just XML that conforms to a certain structure defined by the SOAP standard.
Note: Our company artist is telling me the SOAP diagram above is sexist. OK - fair enough. Let's have a little something for the men and for the women:
I'd rather see someone who is “all-man”. (Takes a second to update.)
So how does this structure really look in practice?
SOAP provides a common message structure. This was needed so the remote server could 1) recognize that it is receiving a formal request to access local software capabilities in the first place, 2) understand what should be provided to that routine, and 3) know what to do if it encounters a problem (like if it doesn’t understand the message). While a detailed explanation of XML is provided in the XML chapter (and if you have any sense you skipped that god-awful thing), it still isn’t too hard to see the similarity between the following skeleton of a SOAP message and the boring diagram above:
[Here’s where extraneous information, like password data, resides]
[Here’s where the actual message content resides]
[Here are instructions to the server about how to handle errors]
So why does SOAP use XML?
SOAP is composed of XML because XML converts everything – orders, inventory objects, dates, floating point numbers, integers, etc. – into strings. Since strings are the most portable form of information between computers, XML data files are extraordinarily portable as well. By exclusively using XML, SOAP enjoys this portability too, albeit at the performance-related cost of having to constantly convert back and forth between the native types that are useful for processing and the string types that are useful for communication .
How is a SOAP message processed?
On the receiving end, a “SOAP engine” listens for incoming SOAP requests, converts the strings that compose the message into more useful internal data types (like integers, arrays, or custom objects), invokes the software routine that has been associated with the incoming message type , and then converts the results of the call back into an XML message (itself, a giant string) to return to the client. The SOAP engine performs a lot of work and is frequently a performance bottleneck.
The exact form that your SOAP engine will take depends on your preferred method of shipping. In other works, it depends on the “transport” layer. SOAP engines exist to use HTTP, FTP, TCP, Jabber, SMTP, POP3, and many other transport protocols, but the most common is HTTP since HTTP passes through most firewalls. Since HTTP was designed to establish communications between a browser and a web server (which ordinarily serves out web pages), most HTTP web servers can be augmented in some way to be SOAP engines. For example, a servlet can be installed in the very popular “Apache” web server to allow it to support SOAP requests in addition to its normal web page processing.
On the sending end, many toolkits and APIs exist to help developers compose and send SOAP messages (to the remote SOAP engine where the desired function call will be executed). SOAP messages can be composed and sent from Java, C, C#, PHP, Python, PRI – you name it. Development environments are usually capable of taking WSDL files, which servers distribute as a description of their web services, and automatically generating the required SOAP code for talking to the servers that provide those services.
What does "SOAP over HTTP" mean?
A common source of confusion is why SOAP, a communications protocol, needs another protocol (like HTTP) running underneath it. The answer is that SOAP isn’t really a protocol . SOAP doesn’t really address subjects like message sequencing, retransmissions, data compression, acknowledgements, etc., so that’s why it isn’t a communications protocol. SOAP is really just an accepted structure for XML messages that SOAP engines know how to process. A great deal of confusion in the industry came from this decision to call SOAP a “protocol”. Even books by experts confuse the packaging of the remote procedure call instructions with the protocols used to send those messages. SOAP is often cited as a peer to HTTP, JMS, Jabber, SMTP, e-mail, or ftp, but it is not. Rather, SOAP messages are the content that gets sent across those transports. So SOAP really has a bug in the fourth letter, which beats PHP, which has a bug in the first letter (see first paragraph of our PHP page).
How does SOAP relate to WSDL and UDDI?
SOAP is often mentioned in the context of WSDL. WSDL files are what web servers use to completely describe their service interfaces. That is, they describe what services are available, what message are expected, what objects look like within messages, and what the low level mechanics of communication are. WSDL files provide enough information for tools to read them and automatically generate the communications code to talk to the servers mentioned in the file and use their services. WSDL files allow these tools to know what the SOAP message is supposed to look like.
How to WSDL and SOAP work together?
So WSDL files are used to generate SOAP code. In fact, WSDL is used to generate SOAP in two cases. The first case is when a developer wants to generate a SOAP client at development time. Their development tools read WSDL files to help them generate a client application . Code is created to talk to the remote system. The second case is when a client wants to dynamically find and attach to a SOAP server at run time. In this second “dynamic binding” case, a directory service is often used to provide a WSDL file that is used at runtime to find an appropriate web server. This is where “UDDI” comes in (see the Governance page). UDDI is the directory service that is used to establish contact between a client and an appropriate web service server.
While WSDL is used to automatically generate SOAP communications code (either at development time or at run time), WSDL is itself automatically generated as well. You’d have to have a hole in your head to try to construct one of those files manually. Tools exist to read source code (like Java routines) and help generate WSDL files for that code. Such tools make it easy to expose the routines in existing legacy applications as universally available web services. SOAP is used to talk to the routines that were found and described through these WSDL files.
I hear them mention "literal", "literal wrapped", "encoded", and "styles". What the heck are they talking about?
SOAP messages are often referred to as having either an “rpc”, or “document” style and having either “literal”, “literal wrapped”, or “encoded” use. This is an extremely confusing area of SOAP, due mostly to a poor choice of terms. Even books by expert authors present false information or half truths about choosing the SOAP style and use. For example, the style is often presented as a choice between an object-oriented approach (document) and a functional programming approach (rpc), which isn’t necessarily true. Sometimes the rpc/document choice is represented as a choice between asynchronous (document) and synchronous (rpc) communication, which isn’t true either. The style is really used to map messages to the appropriate service routines (a process that is not always straightforward). In the documented approach, the choice is made dynamically based on the document type. The use is used to determine if and how message integrity should be checked (perhaps against an XSD schema). A complete discussion of these options would be utterly exhausting and is really unnecessary anyway since the industry is moving toward “document/literal wrapped” in all but a few cases . Unfortunately document/literal wrapped SOAP messages are even harder to read since they inject more namespaces into the remote procedure call parameters (see our web page on XML Schemas for a description of the nightmare that is XML namespaces). The key point to remember is that when developers start using terms like “document”, “rpc”, or “literal wrapped”, there’s a 87.3% chance that it’s OK to take a Jessica Alba mind break while they talk.
What are the common and serious mistakes that companies make over and over with SOAP?
There’s a great deal of criticism of SOAP out there on the web, but the IT industry continues to embrace it anyway. Why? What are the alternatives to using SOAP? What are the most common problems of projects that use SOAP? A heightened awareness of common SOAP pitfalls today can save you a lot of headaches tomorrow. The authors continue to see project after project that make the same mistakes, and the most expensive mistakes involve fundamental re-engineering. Don’t let that happen to your project or to your career (especially in a down economy). Leverage off the experience of consultants who have spent years in the trenches getting SOA projects up an running at end client sites. The book is an easy, pleasant read and it's likely to help you avoid some serious problems. Don't just settle for the excerpt on this page. Click the “Buy Now” button and act to protect your career!