NAME
XPC - XML Procedure Call
SYNOPSIS
use XPC;
and then
my $xpc = XPC->new(<<END_XPC);
<?xml version='1.0' encoding='UTF-8'?>
<xpc>
<call procedure='localtime'/>
</xpc>
END_XPC
or
my $xpc = XPC->new();
$xpc->add_call('localtime');
or
my $xpc = XPC->new_call('localtime');
and then later
print XML_FILE $xpc->as_string();
DESCRIPTION
This class represents an XPC request or response. It uses XML::Parser to parse XML passed to its constructor.
MOTIVATION
A Commentary on the XML-RPC Specification and Definition of XPC Version 0.2
Introduction
The following commentary is based upon the specification from the UserLand web site. The version referenced for this commentary has a notation on it that it was "Updated 10/16/99 DW" (see http://www.xmlrpc.com/spec).
These comments are stylistic in nature, and it is well recognized by the author that style in program and protocol design are very personal. This commentary will, however, point out the rationale of the proposed changes to the specification's design.
Procedure Call Structural Simplifications
The example in the "Request example" section looks like this:
<methodCall>
<methodName>examples.getStateName</methodName>
<params>
<param>
<value><i4>41</i4></value>
</param>
</params>
</methodCall>
We note by looking at the remainder of the specification that there are only two top-level elements allowed in XML-RPC: methodCall
and methodResponse
. Since methods are the subject of RPC, and since all top-level elements in the design are about methods, there is no need to have the redundant qualifier "method" in the names of these elements. Thus, the example would be modified to look like this:
<call>
<methodName>examples.getStateName</methodName>
<params>
<param>
<value><i4>41</i4></value>
</param>
</params>
</call>
Now, the content of the methodName
element is constrained to be very simple text (from the "Payload format" section, which says "... identifier characters, upper and lower-case A-Z, the numeric characters, 0-9, underscore, dot, colon and slash"). It is also mandatory. This is precisely the reason XML includes the ability to add attributes to elements (it is technically redundant, but very convenient). So, we really should turn this example into:
<call method='examples.getStateName'>
<params>
<param>
<value><i4>41</i4></value>
</param>
</params>
</call>
Once the methodName
element has been removed from the design, the params
element becomes superfluous, since its only purpose was to group the parameters and separate them from the method name. Now, the call
element is the element that groups the parameters, leaving us with:
<call method='examples.getStateName'>
<param>
<value><i4>41</i4></value>
</param>
</call>
Header Nomenclature
One final comment on terminology: RPC stands for Remote Procedure Call, so we should probably not use the term "method" when we mean "procedure" or something else. Since the "procedures" can return values, which corresponds in some languages to the term "function", we have a rivalry for the term to use. "Procedure" matches the acronym nicely, but for some folks "Function" would have a better connotation. Fans of Eiffel might even prefer "Feature", or "Query" for calls returning a value and "Routine" or "Command" for those not. Given the variety of possibilities, here we stay with the simple policy of matching the acronym:
<call procedure='examples.getStateName'>
<param>
<value><i4>41</i4></value>
</param>
</call>
Scalar Values
Typically, an interface definition determines the number, names and types of parameters to a procedure call. It is incumbent upon the caller to conform to that specification. Therefore, the declaration for any procedure to be called as part of an interface should indicate the expected types of the parameters, which means that the caller should not have to indicate the type of value it is passing (and, the value itself isn't passed in general, but rather a textual representation of the value is passed). XML-RPC should not be blind to typing issues. These issues should not appear in the calling standard, but rather in an interface definition standard (about which more later). Removing the type information from the example results in:
<call procedure='examples.getStateName'>
<param>
<value>41</value>
</param>
</call>
Since the <value> element really now just means "scalar" (see the specification section "Scalar <value>s"), let's call it that:
<call procedure='examples.getStateName'>
<param>
<scalar>41</scalar>
</param>
</call>
If for some reason not contemplated here type information is necessary for scalars, then having a simple type
attribute of the scalar
element would suffice, especially since the set of allowable values is fixed, small, and consists of only short string values (i4
, int
, boolean
, string
, double
, dateTime.iso8601
, and base64
).
If we only ever expected simple, short scalar values, we could make one more change, to:
<!-- NOTE: This is NOT a proposed change -->
<call procedure='examples.getStateName'>
<param>
<scalar value='41'/>
</param>
</call>
but, it is presumed that it would be possible to have a very long scalar string value, for which the former representation would be better.
Named Parameters
Some procedures may be implemented in a language that makes it very easy to implement named parameters. Supporting this would be easy:
<call procedure='examples.getStateName'>
<param name='stateNum'>
<scalar>41</scalar>
</param>
</call>
Scalar Types
Whether types apply to calls and interfaces or just to interfaces, they are an important part of the specification.
The specification defines i4
and int
to be synonyms for a 'four-byte signed integer'. Since the value will be represented in the call as text, this description really isn't an appropriate specification, since it is written in terms of a binary representation. We suggest here a single term for this data type, integer
, and that it be defined in terms of a range of acceptable values: -2,147,483,648 to +2,147,483,647 (just the range of vales that can be stored in a two's complement 32-bit binary representation).
The boolean
data type is distinct from the integer
data type, yet its domain {0
, 1
} is a subset of the integer
domain instead of the more consistent {false
, true
}. If boolean
is going to be treated as its own type, it should have its own domain.
The specification defines double
to be 'double-precision signed floating point number'. Note that in the 1999-01-21 questions-and-answers section near the end of the document, it is revealed that the full generality of the data type commonly meant by such a description is not available. Niether infinities, nor NaN
(the Not-a-Number value) are permitted. Not even exponential notation is allowed. Very simple strings matching the Perl regular expression:
/^([+-])(\d*)(\.)(\d*)$/
are the only ones permitted according to the answer given, although one suspects that what was meant was something closer to this:
/^([+-])?(\d*)((\.)(\d*))?$/
because the first expression requires the sign to be present, and permits "+.
" and "-.
" as valid strings (although to what values they would map is a mystery).
Note: The second expression makes the leading sign and trailing decimal point and digits optional, but still isn't perfect, since it allows the empty string as a value.
This type should be called rational
instead of double
to get away from the physical description. decimal
is another potentially reasonable name for this type.
Also, the FAQ answer says the range of allowable values is implementation- dependant, but the specification refers to "double-precision floating-point", which does have an expected set of behaviors for most people.
The specification mentions "ASCII" in the type definition for string, but XML permits all of Unicode. Shouldn't one expect to be able to pass around string values with all the characters thus permitted? Shouldn't servers and clients be written to handle this broader character set, and convert as necessary internally? Otherwise, we are taking a big step back from the promise of XML and the web.
The dateTime.iso8601
data type name is awkward. They didn't refer to the IEEE 754 floating point standard in the name of the double
type (which would have been double.ieee754
if they had). Unless the specification is going to allow multiple dateTime
variants, the qualifier is just an annoyance. In addition, most people call this type timestamp
, even if their computer languages sometimes just call it DATE
(as in many SQL implementations). So, here we propose that this type just be called timestamp
and that the type description refer to the ISO 8601 standard.
Finally, the base64
type (added 1999-01-21) really should be binary
with the encoding standard (Base-64) referenced in the type description.
Structures
Structures continue the same idiom used elsewhere in the specification: the avoidance of element attributes. Here is the example used in the specification (modified to acommodate the recommendations already made here):
<struct>
<member>
<name>lowerBound</name>
<scalar>18</scalar>
</member>
<member>
<name>upperBound</name>
<scalar>139</scalar>
</member>
</struct>
The name
element here should be converted into an attribute of the member
element, leaving:
<struct>
<member name='lowerBound'>
<scalar>18</scalar>
</member>
<member name='upperBound'>
<scalar>139</scalar>
</member>
</struct>
Arrays
The array
element is defined with a superfluous data
child element. This element serves no function, so it should be removed. Here is the example from the specification (again, modified based on previous recommendations):
<array>
<data>
<scalar>12</scalar>
<scalar>Egypt</scalar>
<scalar>false</scalar>
<scalar>-31</scalar>
</data>
</array>
Removing the unneeded data
element leaves us with:
<array>
<scalar>12</scalar>
<scalar>Egypt</scalar>
<scalar>false</scalar>
<scalar>-31</scalar>
</array>
We have recommended getting rid of value
and using scalar
, but the specification allows a value
to contain a scalar value or a struct
or an array
. We can still do without the value
element, though:
<array>
<scalar>12</scalar>
<array>
<scalar>Egypt</scalar>
<scalar>false</scalar>
<scalar>-31</scalar>
</array>
</array>
Responses
The example in the document is:
<?xml version="1.0"?>
<methodResponse>
<fault>
<value>
<struct>
<member>
<name>faultCode</name>
<value><int>4</int></value>
</member>
<member>
<name>faultString</name>
<value><string>Too many parameters.</string></value>
</member>
</struct>
</value>
</fault>
</methodResponse>
This has much unnecessary nesting. It is much simpler to store the fault code as an attribute of the fault
element and to have the fault description be the body of the fault
element:
<?xml version="1.0"?>
<methodResponse>
<fault code='4'>
Too many parameters.
</fault>
</methodResponse>
Adding a Consistent Top-Level Element
It would be nice if one could always be sure that XML data involved in the XML-RPC protocol had a particular root element.
Another benefit of doing this is that a given request could include multiple calls, which for certain types of interactions could be of great performance benefit. If you need to make many related calls, the network latency would be a real drag on performance, but batching up the calls into one big bundle amortizes the transport time, increasing performance. A top- level element of xpc
is used here to stand for "XML Procedure Call".
<xpc>
<call> ... </call>
<call> ... </call>
<call> ... </call>
</xpc>
As soon as we decide to put multiple calls in a transmission, it begs the issue of tieing responses to calls. We could use order for this, but we could also provide an attribute to call
and response
called id
that is optionally provided by the caller, and if present, is copied into the response element for that call.
HTTP POST REQUEST CONTENT:
<xpc>
<call ... id='1'> ... </call>
<call ... id='foo'> ... </call>
<call ... id='some_guid'> ... </call>
</xpc
HTTP RESPONSE CONTENT:
<xpc>
<response id='1'> ... </call>
<response id='foo'> ... </call>
<response id='some_guid'> ... </call>
</xpc
Another benefit of having a consistent top-level element is that we can use it to specify the protocol version:
<xpc version='0.2'>
<call ...> ... </call>
</xpc
Finally, using a consistent top-level element permits the response to contain a copy of the request if desired.
HTTP POST REQUEST CONTENT:
<xpc>
<call ... id='1'> ... </call>
<call ... id='foo'> ... </call>
<call ... id='some_guid'> ... </call>
</xpc
HTTP RESPONSE CONTENT:
<xpc>
<call ... id='1'> ... </call>
<call ... id='foo'> ... </call>
<call ... id='some_guid'> ... </call>
<response id='1'> ... </call>
<response id='foo'> ... </call>
<response id='some_guid'> ... </call>
</xpc
Extended Types
Given that XML-RPC is an XML application, it is disconcerting to see its design be so blind to XML issues such as Unicode values (discussed above) and tree-structured data. Suppose a procedure was to accept XML as a parameter or to return XML as its result. How would this be accomplished with XML-RPC? The answer seems to be "stuff it in a string scalar". But, to be a proper string, all the markup would have to be escaped:
<call procedure='foo'>
<param>
<scalar>
<bar>Here's some text in an element.</bar>
</scalar>
</param>
</call>
However, if we add to the scalar
, array
and struct
types a new type xml
, then we can do the natural thing:
<call procedure='foo'>
<param>
<xml>
<bar>Here's some text in an element.</bar>
</xml>
</param>
</call>
We could even use XML Namespaces if needed to resolve element name collisions if they arise (namespaces are commonly used for this reason in XSLT transforms).
Technically speaking, allowing parameters and results to contain XML makes the other XML-RPC types redundant, but providing shortcuts for these common cases does make sense.
Interface Specifications
In order to provide true discoverability, there needs to be a way for a client to ask the server what operations it supports, and to get back interface information for the supported procedures.
Sending an empty query
element should cause the server to return an array of procedure names:
HTTP POST REQUEST CONTENT:
<xpc>
<query/>
</xpc>
HTTP RESPONSE CONTENT:
<xpc>
<result>
<array>
<scalar>foo</scalar>
<scalar>bar</scalar>
</array>
</result>
</xpc>
Sending a query
element with a procedure name filled in should return a response containing a prototype:
HTTP POST REQUEST CONTENT:
<xpc>
<query procedure='foo'/>
</xpc>
HTTP RESPONSE CONTENT:
<xpc>
<prototype procedure='foo'>
<comment>
The 'foo' procedure! Given an integer, returns an array with that
many elements, with each element containing the integer number of
its position within the array.
</comment>
<param-def name='splee' type='scalar' subtype='integer'/>
<result-def type='array'/>
</prototype>
</xpc>
Requesting information on an unknown procedure results in a fault
return:
HTTP POST REQUEST CONTENT:
<xpc>
<query procedure='quux'/>
</xpc>
HTTP RESPONSE CONTENT:
<xpc>
<fault code='42'>
Unknown procedure name 'quux'!
</fault>
</xpc>
Conclusion
The "Strategies/Goals" section of the specification lists these items (paraphrased):
Leverage the ability of CGI to pass many firewalls to build an RPC mechanism that can cross many platforms and many network boundaries.
Cleanliness.
Extensibility.
Easy implementation.
The first of these seems to be met without difficulty by leveraging the HTTP protocol.
Cleanliness is of course a subjective measure, and this document has pointed out many points on which we think cleanliness can be improved.
The original specification doesn't seem to address extensibility other than to list it as a goal. This document's addition of the XML type provides much extensibility.
Ease of implementation should not be radically decreased by the modified version of XML-RPC proposed here, except in the handling of Unicode text. This is likely the main reason ASCII was specified in the original protocol definition.
ADDITIONAL INFORMATION
The following sections provide details behind the proposed XPC.
Document Type Definition for Proposed XPC
This appendix shows the complete simple DTD for XPC. It is no more complicated than the XML-RPC DTD (see http://www.ipso-facto.demon.co.uk/xml-rpc-inline.html or http://www.ontosys.com/xml-rpc/xml-rpc.dtd).
<!-- We are going to use this parameter entity to refer to the value -->
<!-- element types. -->
<!ENTITY % value "(scalar|array|struct|xml)" >
<!ENTITY % request "(query|call)" >
<!ENTITY % response "(prototype|result|fault)" >
<!-- We can have any number of calls and responses inside the top-level -->
<!-- element (but at least one). -->
<!ELEMENT xpc ( %request; | %response; )+ >
<!ATTLIST xpc version CDATA #IMPLIED >
<!-- A query is always empty, and it has an optional procedure attribute. -->
<!-- It can also have an id attribute to distinguish it from other -->
<!-- requests in the same transaction. -->
<!ELEMENT query EMPTY >
<!ATTLIST query procedure CDATA #IMPLIED >
<!ATTLIST query id ID #IMPLIED > <!-- TODO: Can it be ID *and* #IMPLIED? -->
<!-- A call can have zero or more parameters. -->
<!ELEMENT call (param)* >
<!ATTLIST call procedure CDATA #REQUIRED >
<!ATTLIST call id ID #IMPLIED > <!-- TODO: Can it be ID *and* #IMPLIED? -->
<!-- A param *must* have one of the value elements as a child. -->
<!ELEMENT param %value; >
<!ATTLIST param name CDATA #IMPLIED >
<!-- Types for scalars are shown here as optional, but they may not need -->
<!-- to be part of the design. -->
<!ELEMENT scalar (#PCDATA) >
<!ATTLIST scalar type (boolean|integer|rational|string|timestamp|binary)
#IMPLIED >
<!-- An array has any number of elements, each of which is of one of the -->
<!-- value elements. -->
<!ELEMENT array (scalar|array|struct)* >
<!-- A structure has one or more members. -->
<!ELEMENT struct (member+) >
<!-- A member has a name and *must* contain one of the value elements as -->
<!-- a child. -->
<!ELEMENT member %value; >
<!ATTLIST member name CDATA #REQUIRED >
<!-- An xml value element can contain any XML data. -->
<!ELEMENT xml ANY >
<!-- A fault has a name and contains text. -->
<!ELEMENT fault (#PCDATA) >
<!ATTLIST fault code CDATA #REQUIRED >
<!ATTLIST fault id ID #IMPLIED > <!-- TODO: Can it be ID *and* #IMPLIED? -->
<!-- A result is like a param, and *must* have one of the value elements -->
<!-- as a child. -->
<!ELEMENT result %value; >
<!ATTLIST result name CDATA #IMPLIED >
<!ATTLIST result id ID #IMPLIED > <!-- TODO: Can it be ID *and* #IMPLIED? -->
<!-- A prototype gives the calling convention for a procedure. -->
<!ELEMENT prototype (comment?, (param-def|result-def)*) >
<!ATTLIST prototype procedure CDATA #REQUIRED >
<!ATTLIST prototype id ID #IMPLIED > <!-- TODO: Can it be ID *and* #IMPLIED? -->
<!-- A param-def defines an optional name, type and subtype for the -->
<!-- parameter. It may also contain a comment about the parameter. -->
<!ELEMENT param-def (comment?) >
<!ATTLIST param-def name CDATA #IMPLIED >
<!ATTLIST param-def type (scalar|array|struct|xml) #IMPLIED >
<!ATTLIST param-def subtype (boolean|integer|rational|string|timestamp|binary) #IMPLIED >
<!-- A result-def defines an optional name, type and subtype for the -->
<!-- result. It may also contain a comment about the result. -->
<!ELEMENT result-def (comment?) >
<!ATTLIST param-def name CDATA #IMPLIED >
<!ATTLIST param-def type (scalar|array|struct|xml) #IMPLIED >
<!ATTLIST param-def subtype (boolean|integer|rational|string|timestamp|binary) #IMPLIED >
<!ELEMENT comment (#PCDATA) >
XML Schema for Proposed XPC
<!-- TODO -->
An XML-RPC <---> XPC Gateway
The following XSLT transform will convert XML-RPC requests into XPC requests:
<!-- TODO -->
The following XSLT transform will convert XPC responses into XML-RPC responses (where it is possible):
<!-- TODO -->
The following XSLT transform will convert XPC requests into XML-RPC requests (where it is possible):
<!-- TODO -->
The following XSLT transform will convert XML-RPC responses into XPC responses:
<!-- TODO -->
AUTHOR
Gregor N. Purdy <gregor@focusresearch.com>
COPYRIGHT
Copyright (C) 2001 Gregor N. Purdy. All rights reserved.
This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.