James Clark's Random Thoughts: 2007

2007-12-09

HTTPbis

Mark Nottingham explains the work being done in the IETF to revise HTTP. It sounds to me like they're doing exactly the right thing, focusing on producing a better spec that brings light to some of the darker corners of the protocol and reduces the gap between what the spec says and what you actually need to implement to achieve interoperability. It's good to see that capable people have stepped up to put in the not inconsiderable time and effort that's needed for this unglamorous but very useful work.

2007-12-07

Thai personal names

There's an election coming up in Thailand on December 23rd and the streets are lined with election posters. As a bit of an i18n geek, I find it interesting that the posters almost all make the candidates' first names at least twice as big as their last names. If you're also an i18n geek, your reaction might well be: "it must be because Thais write their family name first, followed by their given name". But you would be wrong. Thais have a given name and a family name; the given name is written first, and the family name last.

The correct explanation that given names play a role in Thai culture that is similar to the role that family names play in many Western cultures. The polite way to address somebody is with an honorific followed by their given name. The Thai telephone book is sorted with given names as the primary key and family names as the secondary key.

(I have to say that this has led me to question what I perceive to be the i18n orthodoxy that it's more i18n-ly correct to talk of given name/family name than first name/last name. Why does it matter whether a name is a family name or a given name? Surely what matters is the cultural role that the name plays.)

I guess that historically the main reason for the dominance of given names in Thai culture is because family names are a relatively recent innovation: they were introduced by King Rama VI towards the beginning of the 20th century. Family names were allocated to families systematically and the use of family names is still controlled by the government. Any two people in Thailand with the same family name are related. This leads to Thai family names being quite a mouthful. Here's a sample from people in the news over the past couple of days: Leophairatana, Tantiwittayapitak, Boonyaratkalin. Even Thais have difficulty remembering each others family names.

If you become a Thai citizen, you have to choose a new, unused family name. Just as with domain names, all the good, short names have gone. So the more recently your family has become Thai, the longer and more unwieldy your family name is likely to be.

Thai given names usually have at least two or three syllables. There aren't any given names that are as commonly used in Thai culture as the most popular given names in Western cultures. I've never come across a situation where two living Thais share the same given name and family name. You would certainly never get the situation of hundreds of people having the same given name and family name (like "James Clark").

Thais rarely use the First.Last@domain convention for email. It would be too unwieldy. The conventions I've seen most often are First.La@domain and First.L@domain (i.e. use only the first one or two characters of the last name).

Another I18N wrinkle is that Thais' official first and given names are in Thai script not in Roman script. But in many situations Thais use romanized versions of their names. And while there is a standard way (actually several standard ways) of romanizing Thai, the convention is that the correct romanization of any personal name is what the holder of the name wishes it to be. (Thus, your application may need to store two versions of names: the Thai script version and the romanized version.)

With honorifics, I think the nastiest gotcha from an i18n perspective is that, while the given and family name are conventionally written separated by a space, there is no separator between the honorific and the given name. (Words in Thai are normally not separated by spaces.) This applies only in Thai script. When romanized, you would need a space between the honorific and the given name.

Since given names are used in Thai culture somewhat like family names are used in some Western cultures, you might be wondering what serves the role that given names serve in Western cultures. All Thais have a name referred to as a "chue len". This is typically translated as "nickname", but it has a more important role in Thai culture than a nickname does in Western culture. I think it would be more accurate to describe it as an "informal given name". Parents give each of their children a chue len, in addition to a formal given name. You would typically use a chue len to address somebody in contexts where in England you might use their first name.

Whereas formal given names are restricted to names that the bureaucrats of the interior ministry deem appropriate, parents can and do follow their personal whims when it come the chue len. For example, a former employee of mine was called "Mote", which was abbreviated from "remote", as in TV remote control. (This illustrates another interesting aspect of Thai culture: words are commonly shortened by omitting all except the last syllable. For example, a kilo is often referred to as a "lo".)

In perhaps 80% of cases the chue len is a single syllable. It's often very difficult to romanize these. Thai has tones as well as one of the richest collection of vowels of any language. Most romanization schemes don't preserve subtle differences in tones and vowels. Whereas this is workable with formal given names and family names, which usually have many syllables and some redundancy, if you don't get the vowel or tone of a chue len exactly right, it becomes another name. For example, another of my employees has a name that sound like the second syllable of the word "apple", but with the "l" changed to a "n", and pronounced in an emphatic (falling) tone. I can write that sound unambiguously in Thai, but I've no idea how to write it in English.

Occasionally the chue len is a shortened version of the given name, but more often it is completely unrelated. If you know somebody only in a relatively informal social context, it is quite likely that you will know only their chue len and not their formal given name or family name.

I think it would be quite challenging to design an address book application that deals with all this naturally. No application I've used does a good job and indeed it's not immediately obvious to me what the right approach to handling this is. (However, I suspect an approach based on adding markup to the display name will work better than trying to figure out a set of database fields.)

Of course, it becomes even more difficult if you want to deal with complexities that arise in other cultures. I'm sure that just as personal names in Thai culture have some features that are surprising from a Western perspective, there must be many other cultures where personal names have equally surprising features. I would love to learn more about these. If anybody can blog or comment with additional information, that would be great.

(Any Thais reading this, please feel free to add comments correcting anything I've got wrong or adding any important points I've missed.)

2007-11-03

Strategies for using open source in the Thai software industry

The following is adapted from the slides of a presentation I gave yesterday on how the Thai software industry can benefit from open source. I think a more important problem is how the country as a whole can benefit from open source, but that wasn't what I was asked to talk about. Also note that the objective here is not to help open source but to help the Thai software industry. I think most, if not all of this, is applicable to other countries at a stage of development similar to Thailand's.

Application platform

Applications need server platform, including
- OS
- Database
- Web server, framework
Open source server platform is at least as good in quality as proprietary platforms
Platform does not compete with local software industry
Using open source on the server does not require users to move away from familiar Windows desktop environment
Virtualization enables applications built on fully open source application platform to be deployed on Windows
Trend towards web-based applications, where everything is on the server
Avoids cost of platform software licenses, according to business model
- Licensing software: users save cost
- Appliance, software as a service: producer saves cost
Licensing issues
- Software as a service: no issues
- Licensing software: must keep separation between proprietary and open source parts (no linking)
- Appliance: must make some parts of source code available to customers
Mixed strategies also possible (e.g. Oracle on Linux, PHP on Windows)

Development tools

Traditional strength of open source
Java-based IDEs (e.g. Eclipse, NetBeans)
- Written in Java, but support many kinds of development in addition to Java, e.g. C/C++, Web
- Several companies adopting Eclipse as base (e.g. Nokia)
- Main advantage compared to Microsoft is no lock-in to Microsoft application platform
- Cost not the key issue: Microsoft makes development tools available to ISVs at low cost
Collaboration tools
- Open source community has evolved exceptionally effective collaboration tools because
  - it is highly distributed
  - it only adopts process to the extent that it actually delivers results
- Proprietary tools expensive
- Key tools
  1. Version control (CVS, Subversion, Mercurial)
  2. Issue tracking (Bugzilla, Trac)

Education and professional development

Participation in open source projects builds skills that universities often fail to teach
- Communication, especially English language
- Cooperation
- Working with large programs
- Modifying existing programs as opposed to creating new programs
Opportunity to work with world-class developers
Helps career of individual developer by building personal brand
- Opportunity to get work overseas
- Improves chances of getting into good US graduate school
Builds highly motivated developers with world-class skills, who wish to pursue technical career
Useful both at student and professional level
Should emphasize participation in existing, successful, international projects
Be highly selective about starting new projects
- Successful, large open source projects could help build image of sponsor organization or Thailand generally
- But very difficult to create a really successful, large open source projects
- Choose area where no open source solution is yet available; opportunities still exist
- Need to choose projects that can benefit rather than compete with local software industry
Individuals must choose projects they are passionate about

Embedded software

Hardware sales provide well-understood business model
Trend to Linux as OS for embedded systems
- Increased power of embedded devices
- Need for strong networking capabilities
Opportunity for electronics industry to move up the value chain

Fully open source business model

Product is fully open source
Possible for small company to achieve large market share because of
- No licensing cost
- Contribution of open source community
- Examples: JBoss, MySQL
Business model based on support, consulting, training
Not an easy strategy

2007-10-31

E4X not in ES4

I was surprised to find that ES4 does not fold in E4X (although it reserves the syntax). I had always viewed E4X as being one of the smoothest integrations of XML into a scripting language. However, it seems that once you dig a bit deeper, it has some problems.

Optional typing in ES4

ES4 takes a very interesting approach to typing. They've added static typing but made it completely optional. Variable declarations can optionally be annotated with a type declaration. However, the variable declarations don't change the run-time semantics of the language. The only effect of the declarations is that if you run the program in strict mode, then the program will be verified before execution and rejected if type errors are found. Implementations don't have to support strict mode. You can still have simple, small footprint implementations that do all checks dynamically. Users who don't want to be bothered with types can write programs without having to learn anything about the type system.

There's a good paper by Gilad Bracha on Pluggable Type Systems that explains why type systems should be optional not mandatory. I think he's right. The dichotomy between statically and dynamically typed languages is false: an optional type system allows you to have the benefits of both. The paper goes further and argues that type systems should be not merely optional but pluggable. I'm not convinced on this. Pluggable type systems are a great idea if you are a language designer who wants to experiment with type systems; but for a production language, I think it's a fundamental responsibility of the language designer to choose a single type system.

Anyway, it's great to see optional typing being adopted by a mainstream language.

ECMAScript Edition 4

The group working on the next version of ECMAScript (ES4) have released a language overview. There's a lively discussion on the mailing list about some of the politics behind the evolution of ES4. (The situation appears to be that Microsoft doesn't want major new features in ECMAScript, whereas Mozilla and Adobe want to evolve it rather dramatically.)

2007-10-29

Signing HTTP requests

When I first started thinking about signing HTTP responses, I assumed that signing HTTP requests was a fairly similar problem and that a single solution could deal with signing requests as well as responses. But after thinking about it some more, I'm not so sure.

The first thing to bear in mind is that signing an HTTP request or response is not an end in itself, but merely a mechanism to achieve a particular goal. The purpose of the proposal that I've been developing in this series of posts is to allow somebody that receives a representation of a resource to verify the integrity and origin of that representation; the mechanism for achieving this is signing HTTP responses.

The second thing to bear in mind is the advantages of this proposal over https. Realistically, there's not much point to a proposal in this space unless it has compelling advantages over https. There are two advantages that I find compelling:

better performance: clients can verify the integrity of responses without negatively impacting HTTP caching, whereas requests and responses that go over https cannot be cached by proxies;
persistent non-repudiation: by this I mean a client that verifies the integrity and origin of a resource can easily persist metadata that makes it possible to subsequently prove what was verified to a third party.

One key factor that allows these advantages is that the proposal does not provide confidentiality.

As compared to other approaches to signing messages (such as S/MIME), the key advantage is that the signature will be automatically ignored by clients that don't understand it, just by virtue of normal HTTP extensibility rules.

If we turn to signing HTTP requests, or more specifically HTTP GET requests, none of the above considerations apply.

The goal of signing an HTTP GET request is typically to allow the server to restrict access to resources.
If you're really serious about restricting access to resources, and you want to protect against malicious proxies, then you will want to protect the confidentiality of the response; if the request includes a signature that says x authorizes y to access resource r at time t, then the representation of r in the response ought to be encrypted using y's public key.
Furthermore, if a server is restricting access to resources, then the signature on the request can't be optional, so the advantage over other message signing approaches such as S/MIME disappears.
Adding signatures to HTTP GET requests is inherently going to inhibit caching. A cached response to a request signed by x for resource r cannot in general be used to respond to a request signed by y for resource r.
Neither of the compelling advantages (better performance, and persistent non-repudiation) which I mentioned above applies any longer.

On the other hand, if we consider signing HTTP PUT (and possibly POST) requests, then there seems to be more commonality. Signing an HTTP PUT request serves the goal of allowing the server to verify the integrity and origin of the representation of a resource transferred from the client. Although I don't think there will be a significant performance advantage over https, persistent non-repudiation could be useful.

I think my conclusion is that it's better to think of the proposal not as a proposal for signing HTTP responses, but as a proposal for allowing verification of the origin and integrity of transfers of representations of resources. When considered in this light, signing of HTTP GET requests doesn't really fit in.

By the way, I'm not saying HTTP request signing isn't a useful technique. For example, OAuth is using it to solve an important problem: allowing users to grant applications limited access to private resources. But I think that's a very different problem from the problem that I'm trying to solve.

2007-10-16

HTTP response signing strawman

If we revise the abstract model for generating a Signature header along the lines suggested in my previous post, we get this:

Choose which key (security token) to use and create one or more identifiers for it. One possible kind of key would be an X.509 certificate.
Choose which response headers to sign. This would include at least Content-Type and probably Date and Expires. It would not include hop-to-hop headers.
Compute the digest (cryptographic hash) of the full entity body of the requested URI. Base64-encode the digest.
Create a Signature header template; this differs from the final Signature header only in that it has a blank string at the point where the final Signature header will have the base64-encoded signature value. It can specify the following information:
- the type of key;
- one or more identifiers for the key;
- an identifier for the suite of cryptographic algorithms to be used;
- an identifier for the header canonicalization algorithm to be used;
- a list of the names of the response headers to be signed;
- the request URI;
- the base64 encoded digest (from step 4).
Combine the response headers that are to be signed with the Signature header template.
Canonicalize the headers from the previous step. This ensures that the canonicalization of the headers as seen by the origin server are the same as the canonicalization of the headers as seen by the client, even if there are one or more HTTP/1.1 conforming proxies between the client and the origin server.
Compute the cryptographic hash of the canonicalized headers.
Sign the cryptographic hash created in the previous step. Base64-encode this to create the signature value.
Create the final Signature header by inserting the base64-encoded signature value from the previous step into the Signature header template from step 5.

Note that when verifying the signature, as well as checking the signature value, you have to compute the digest of the entity body and check that it matches the digest specified in the Signature header.

The syntax could be something like this:

There are several issues I'm not sure about.

Should this be generalized to support signing of (some kinds of) HTTP request?
What is the right way to canonicalize HTTP headers?
Rather than having a digest parameter, would it be better to use the Digest header from RFC 3230 and then include that in the list of headers to be signed?
Should the time period during which the signature is valid be specified explicitly by parameters in the Signature header rather than being inferred from other headers, such as Date and Expires (which would of course need to be included in the list of headers to sign)?
Should support for security tokens other than X.509 certificates be specified?

2007-10-15

HTTP: what to sign?

There's been quite a number of useful comments on my previous post, and even an implementation. The main area where there seems to be disagreement is on the issue of what exactly to sign.

It seems to me that you can look at an HTTP interaction at two different levels:

at a low level, it consist of request and response messages;
at a slightly higher level, it consists of the transfer of the representations of resources.

With a simple GET, there's a one-to-one correspondence between a response message a representation transfer. But with fancier HTTP features, like HEAD or conditional GET or ranges or the proposed PATCH method, these two levels start to diverge: the messages aren't independent entities in themselves, they are artifacts of the client attempting to efficiently synchronize the representation of the resource that it has with the current representation defined by the origin server.

The question then arises of whether, at an abstract level, the right thing to sign is messages or resource representations. I think the right answer is resource representations: those are things whose integrity is important to applications. For example, in the response to the HEAD message, the signature wouldn't simply sign the response to the HEAD message; rather it would cover the entity that would have been returned by a GET. The Signature header would thus be allowed in similar situations to the ETag header and would correspond to the same thing that a strong entity tag corresponds to.

It's important to remember that the representation of the resource doesn't consist of just the data in the entity body. It also includes the metadata in the entity headers. At the very least, I think you would want to sign the Content-Type header. Note that there are some headers that you definitely wouldn't want to sign, in particular hop-to-hop headers. I don't think there's a single right answer as to which headers to sign, which means that the Signature header will need to explicitly identify which headers it is signing.

With this approach the signature doesn't need to cover the request. However, it does need to relate the representation to a particular resource. Otherwise there's a nasty attack possible: the bad guy can replace the response to a request for one resource with the response to a request for another resource. (Suppose http://www.example.com/products/x/price returns the price of product x; an attacker could completely switch around the price list.) I think the simplest way to solve this is for the Signature header in the response to include a uri="request_uri" parameter, where request_uri is the URI of the resource whose representation is being signed. This allows the signature verification process to work with just the response headers and body as input, which should simplify plugging this feature into implementations.

Although not including the request headers in the signature simplifies things, it must be recognized that it does lose some functionality. When there are multiple variants, the signature can't prove that you've got the right variant. However, I think that's a reasonable tradeoff. Even if the request headers were signed, sometimes the response depends on things that aren't in the request, like the client's IP address (as indicated by Vary: *). The response can at least indicate that the response is one of several possible variants, by including Content-Location, Content-Language and/or Vary headers amongst the signed response headers.

The signature will also need to include information about the time during which the relationship between the representation and the resource applies. I haven't figured out exactly how this should work. It might be a matter of signing some combination of Date, Last-Modified, Expires and Cache-Control (specifically the s-maxage and maxage directives) headers, or it might involve adding timestamp parameters to the Signature header.

To summarize, the signature in the response should assert that a particular entity is a representation of a particular resource at a particular time.

2007-10-12

HTTP response signing abstract model

I've argued that there's a need for an HTTP-specific mechanism for signing HTTP responses. So let's try and design one. (Usually at this point, I would start coding, but with security-related stuff, I think it's better to have more discussion up front.)

Before drilling down into syntax, I would like to work out the design at a more abstract level. Let's suppose that the mechanism will take the form of a new Signature header. Here is my current thinking as to the steps involved in constructing a Signature header:

Choose which security token to use and create one or more identifiers for it.
Choose the suite of cryptographic algorithms to use.
Choose which request and response headers to sign.
Compute the cryptographic hashes of the request and response entity bodies if they are present. Base64-encode those hashes.
Create a Signature header template containing the information from steps 1 to 4; this differs from the final Signature header only in that it has a blank string at the point where the final Signature header will have the base64-encoded signatures value.
Canonicalize the request headers.
Combine the response headers with the Signature header template and canonicalize them.
Create a string that encodes the HTTP method, the request URI, the canonicalized request headers from step 6, the response status, and the canonicalized response headers from step 7.
Compute the cryptographic hash of the string created in the previous step.
Sign the cryptographic hash created in the previous step. Base64-encode this to create the signature value.
Create the final Signature header by inserting the base64-encoded signature value from the previous step into the Signature header template from step 5.

Let's flesh this out with a bit of q&a.

What kinds of security token can be used?
At least X.509 certificates should be supported. But there should be the potential to support other kinds of token.
How are security tokens identified?
It depends on the type. For X.509, it would make sense to have a URI that allowed the client to fetch the certificate. It would also be desirable to have an identifier that uniquely identifies the certificate, so that the client can tell whether it already has the certificate without having to go fetch it. As far as I can tell, in the X.509 case, people mostly use the SHA-1 hash of the DER encoding of the certificate for this.
How does the server know what kind of signature (if any) the client wants?
The client can provide a Want-Signature header in the request. (It makes more sense to me to call this Want-Signature than Accept-Signature, just as RFC 3230 uses Want-Digest, since any client should be able to accept any signature.)
What if the response uses a transfer encoding?
It's the entity body that's hashed in step 4. Thus, the sender computes the hash of the entity body before applying the transfer encoding.
What if the response uses a content encoding?
It's the entity body that's hashed. Thus, the sender computes the hash of the entity body after applying the content encoding. From 7.2 of RFC 2616:
entity-body := Content-Encoding( Content-Type( data ) )
What if the response contains a partial entity body (and a Content-Range header)?
The hash covers the full entity-body (that is. the hash is computed by the recipient after all the partial entity bodies have been combined into the full entity-body). In the terminology of RFC 3230, it covers the instance (where there is a relevant instance), rather than the entity. Thus the hash identifies the same thing as a strong entity tag (which in my view makes the terminology of RFC 3230 rather unfortunate).
Why and how are headers canonicalized?
As I understand the HTTP spec, proxies are allowed to change various semantically insignificant syntactic details of the headers before they pass them on. For example, section 2.2 of RFC 2616 says any linear white space may be replaced with a single SP. Section 4.2 says explicitly proxies must not change the relative order of field values with the same header field name, but it seems to imply that proxies are allowed to change syntactic details of the header fields that are not syntactically significant (such as the relative order of headers with different field names). It is not clear to me to what extent section 13.5.2 overrides this. Perhaps some HTTP wizard can help me out here. In any case, there needs to be just enough canonicalization to ensure that the canonicalization of the headers sent by the origin server will be the same as the canonicalization of the headers as seen by the client, no matter what HTTP/1.1 conforming proxies (or commonly-deployed non-conforming proxies) might be on the path between the origin server and the client. I would note that the Amazon REST Authentication scheme does quite extensive canonicalization.
Can there be multiple signatures?
Yes. In the normal HTTP style, the Signature header should support a comma-separated list of signatures. The order of this list would be significant. There should be a way for each signature in the list to specify which of the previous signatures in the list are included in what it signs. There's a semantic difference between two independent signatures, and a later signature endorsing an earlier signature.
How about streaming?
Tricky. The fundamental problem is that HTTP 1.1 isn't very good at enabling the interleaved delivery of data and metadata. This is one of the things that Roy Fielding's Waka is supposed to fix. I don't think it's a good idea to put a lot of complexity into the design of a specific header to fix a generic HTTP problem. The fact that the hash of the entity body is computed in a separate, independent preliminary step makes it a bit easier for the server to precompute the hash when the content is static. Also the signature header could be put in a trailer field when using a chunked transfer-coding. However, although this helps the server, it screws the client's ability to stream, because the client needs to know the hash algorithm up front. And, of course, it also requires all the proxies between the client and the server to support trailer fields properly.

2007-10-10

Why not S/MIME?

A couple of people have suggested S/MIME as a possible starting point for a solution to signing HTTP documents. This seems like a suboptimal approach to me; I'm not saying it can't be made to work, but there are a number of things that make me feel it is possible to do significantly better.

MIME multipart does not seem to me to be the HTTP way. The HTTP way is to put a single entity in a resource, and then have URIs pointing to other parts. The HTTP way to determine where content ends is to count bytes (using Content-Length, or the chunked transfer-coding), not by having a boundary string that you have to search for. MIME multipart feels like an impostor from the email world.
Conceptually we don't want to change the entity that the client receives, we just want the client to be able to check the integrity of the response. Providing integrity by completely changing the content that the client receives doesn't seem a good match for the basic task that we are trying to accomplish.
Using multipart/signed will break clients that don't support it. If the signature was in some sort of header, then browsers that didn't understand the header would automatically ignore it; you could then send signed responses without having to worry about whether the clients supported them or not. With multipart/signed you would have to use content negotiation to avoid breaking clients. Overloading content negotiation to also handle negotiation of integrity checking will interfere with using content negotiation for negotiating content types.
It's useful to be able to negotiate several aspects of digital signatures. What kind of security token is going be used? Although X.509 certificates have to be supported, I think WS-Security does the right thing in not restricting itself to these. It might be useful to have straightforward public keys, without any of the X.509 OSI junk, or to use a symmetric, shared secret key. It's also useful to be able to negotiate which algorithms are used (e.g. SHA-1 vs SHA-256). Trying to this well with content-negotiation would be tough. Better to introduce some sort of Accept-Signature header and do it properly.
Careful thought is needed about what exactly needs to be signed. Obviously signing just the content is not enough. We need to sign at least some of the entity headers as well. With multipart, we can handle this by putting those headers in the part that is signed. But this will lead to duplicating headers because some of those headers (like Date) will also need to be included in the HTTP headers: not fatal, but kind of ugly.
A more subtle problem is that it's not really enough to sign the response entity in isolation. The signature needs to link the response to the request. In the case of a GET, the signature needs to say that the entity is a response to a GET on a particular request URI. Neither the Location nor the Content-Location headers have quite the same semantics as the request URI. Also the response may vary depending on other headers (e.g. Accept), as listed in the Vary header in the response. The signature therefore ought to be able to cover those of the request headers that affect which entity is returned. Also it would be desirable to be able to sign responses to methods other than GET. The signature should probably also cover the status code. I don't see a natural way to fit this into the S/MIME approach.
One of the main points of doing HTTP signing rather than SSL is cache-friendliness. Consider the process of validating a cache entry that has become stale. This works by doing a conditional GET: if the entity body hasn't changed, then the conditional GET will return a 304 along with some new headers, typically including a new Date header. Since the signature typically needs to cover the date, this isn't going to work with multipart/related: the entire entity body would need to be resent so that the Date contained in the relevant MIME part can be updated. On the other hand if the signature is in the header, then the conditional GET can still return a 304 and include an updated signature header that covers the new date.
Finally, S/MIME has been around for a long time, but it doesn't seem to have got any traction in the HTTP world.

A recent development related to signing in the email world is DomainKeys Identified Mail (DKIM), recently standardized as RFC 4871. This does not build on S/MIME at all. The signature goes in a header field. It also doesn't use X.509 certificates and their associated PKI infrastructure; rather it uses public keys distributed using DNS. It looks like a good piece of work to me.

Another interesting development is Amazon's REST authentication scheme. This works by signing headers, although it does so in the context of authentication of the client to the server. It also uses a shared secret and an HMAC rather than public key cryptography.

Overall I think we can do much better than S/MIME by designing something specifically for HTTP.

2007-10-07

Integrity without confidentiality

People often focus on confidentiality as being the main goal of security on the Web; SSL is portrayed as something that ensures that when we send a credit card number over the web, it will be kept confidential between us and the company we're sending it to.

I would argue that integrity is at least as important, if not more so. I'm thinking of integrity in a broad sense, as covering both ensuring that the recipient receives the sender's bits without modification and that the sender is who the recipient thinks it is. I would also include non-repudiation: the sender shouldn't be able to deny that they sent the bits.

Consider books in the physical world. There are multiple mechanisms that allow us to trust in the integrity of the book:

it is expensive and time-consuming to produce something that looks and feels like a real, bound book
we obtain books from bookshops or libraries, and we trust them to give us the real thing
page number make it hard to remove pages
the ISBN allows us to check that something has really been published
the legal requirement that publishers deposit a copy of every book published with one or more national libraries (e.g. Library of Congress in the US or the British Library in the UK) ensures that in the unlikely event that the integrity of a book comes into question, there's always a way to determine whether it is authentic

Compare this to the situation in the digital world. If we want to rely on something published on a web site, it's hard to know what to do. We can hope the web site believes in the philosophy that Cool URIs don't change; unfortunately such web sites are a minority. We can download a local copy, but that doesn't prove that the web site was the source of what we downloaded. What's needed is the ability to download and store something locally that proves that a particular entity was a valid representation of a particular resource at a particular time.

SSL is fundamentally not the right kind of protocol for this sort of thing. It's based on using a handshake to create a secure channel between two endpoints. In order to provide the necessary proof, you would have to store all the data exchanged during the session. It would work much better to have something message-based, which would allow each request and response to be separately secured.

Another crucial consideration is caching. Caching is what makes the web perform. SSL is the mainstay of security on the Web. Unfortunately there's the little problem that if you use SSL, then you lose the ability to cache. You want performance? Yes, Sir, we have that; it's called caching. You want security. Yes, Sir, we have that too; it's called SSL. Oh, you want performance and security? Err, sorry, we can't do that.

A key step to making caching useable with security is to decouple integrity from confidentiality. A shared cache isn't going to be very useful if each response is specific to a particular recipient. On the other hand there's no reason why you can't usefully cache responses that have been signed to guarantee their integrity.

I think this is one area where HTTP can learn from WS-Security, which has message-based security and cleanly separates signing (which provides integrity) from encryption (which provides confidentiality). But of course WS-* doesn't have the caching capability that HTTP provides (and I think it would be pretty difficult to fix WS-* to do caching as well as HTTP does).

My conclusion is that there's a real need for a cache-friendly way to sign HTTP responses. (Being able to sign HTTP requests would also be useful, but that solves a different problem.)

2007-10-04

Bytes not infosets

Security is the one area where the WS-* world has developed a set of standards that provide significantly more functionality than has so far been standardized in the REST world. I don't believe that this is an inherent limitation of REST; I'm convinced there's an opportunity to standardize better security for the REST world. So I've been giving quite a lot of thought to the issue of what the REST world can learn from WS-Security (and its numerous related standards).

Peter Gutmann has a thought-provoking piece on his web site in which he argues that XML security (i.e. XML-DSig and XML encryption) are fundamentally broken. He argues that the fundamental causes of this brokenness are as follows:

1. XML is an inherently unstable and therefore unsignable data format. XML-DSig attempts to fix this via canonicalization rules, but they don't really work.

2. The use of an "If it isn't XML, it's crap" design approach that lead to the rejection of conventional, proven designs in an attempt to prove that XML was more flexible than existing stuff.

He also complains of the difficulty of supporting XML in general-purpose security toolkit:

It's impossible to create something that's simply a security component that you can plug in wherever you need it, because XML security is inseparable from the underlying XML processing system.

I would suggest that there are two different ways to view XML:

the concrete view: in this view, interchanging XML is all about interchanging sequences of bytes in the concrete syntax defined by XML 1.0
the infoset view: in this view, interchanging XML is all about interchanging abstract structures representing XML infosets; the syntax used to represent the infoset is just a detail to be specified by a binding (the infoset view tends to lead to bindings up the wazoo)

I think each of these views has its place. The infoset is an invaluable conceptual tool for thinking about XML processing. However, I think there's been an unfortunate tendency in the XML world (and the WS-* world) to overemphasize the infoset view at the expense of the concrete view. I believe this tendency underlies a lot of the problems that Gutmann complains of.

There's nothing unstable or unsignable about an XML document under the concrete view. It's just a blob of bytes that you can hash and sign as easily as anything else (putting external entities on one side for the moment).
The infoset view makes it hard to accommodate non-XML formats as first-class citizens. If your central data model is the XML infoset, then everything that isn't XML has to get mapped into XML in order to be accommodated. For example, the WS-* world has MTOM. This tends to lead to reinventing XML versions of things just so they can be a first-class citizens in an infoset-oriented world.
If you look at everything as an infoset, then it starts to look natural to use XML for things that XML isn't all that good at. For example, if your message body is an XML infoset and your message headers are infosets, then it looks like a reasonable choice to use XML as your envelope format to combine your body with your headers. But using XML as a container format like this leads you into all the complexity and inefficiency of XML Security, since you need to be able to sign the things that you put in containers. It's much simpler to use a container format that works on bytes, like zip or MIME multipart.
The infoset view leads to emphasize APIs that work with XML using abstractions, such a trees or events, that are at the infoset level, rather than work with XML at a more concrete level using byte streams or character streams. Although infoset-level APIs are needed for processing XML, when you use infoset-level APIs for interchanging XML between separate components, I believe you pay a significant price in terms of flexibility and generality. In particular, using infoset-level APIs at trust boundaries seems like a bad idea.

My conclusion is this: one aspect of the WS-* approach that should not be carried over to the REST world is the emphasis on XML infosets.

2007-10-01

Practical Principles for Computer Security

Well, I really am a lousy blogger. I am in awe of all those who manage to put out interesting posts, day after day, month after month. I guess it must get easier with practice. Anyway, I think I'm going to try out a "little and often" approach to posting.

I have found myself getting more and more interested in security over the last couple of years. I think the single most inspiring paper I have read on security is Practical Principles for Computer Security by Butler Lampson (who has had a hand in inventing an incredibly impressive range of technologies including personal computing, ethernet, laser printers, two-phase commit and WYSIWYG editors). I won't try and summarize it. If you are at all interested in security and you haven't read it, you should do so.

2007-04-11

Validation not necessarily harmful

Several months ago, Mark Baker wrote an interesting post entitled Validation considered harmful. I agree with many of the points he makes but I would draw different conclusions. One important point is that when you take versioning into consideration, it will almost never be the case that a particular document will inherently have a single schema against which it should always be validated. A single document might be validated against:

Version n of a schema
Version n + 1 of a schema
Version n of a schema together whatever the versioning policy of version n says future versions may add
What a particular implementation of version n generates
What a particular implementation of version n understands
The minimum constraints that a document needs in order to be processable by a particular implementation

Extensibility is also something that increases the range of possible schemas against which it may make sense to validate a document. The multiplicity of possible schemas is the strongest argument for a principle which I think is fundamental:

Validity should be treated not as a property of a document but as a relationship between a document and a schema.

The other important conclusion that I would draw from Mark's discussion is that schema languages need to provide rich, flexible functionality for describing loose/open schemas. It is obvious that DTDs are a non-starter when judged against these criteria. I think it's also obvious that Schematron does very well. I would claim that RELAX NG also does well here, and is better in this respect than other grammar-based schema language, in particular XSD. First, it carefully avoids anything that ties a document to a single schema

there's nothing like xsi:schemaLocation or DOCTYPE declarations
there's nothing that ties a particular namespace name to a particular schema; from RELAX NG's perspective, a namespace name is just a label
there's nothing in RELAX NG that changes a document's infoset

Second, it has powerful features for expressing loose/open schemas:

it supports full regular tree grammars, with no ambiguity restrictions
it provides namespace-based wildcards for names in element and attribute patterns
it provides name classes with a name class difference operator

Together these features are very expressive. As a simple example, this pattern

attribute x { xsd:integer }?, attribute y { xsd:integer }?, attribute * - (x|y) { * }

allows you to have any attribute with any value, except that an x or y attribute must be an integer. A more complex example is the schema for RDF. See my paper on The Design of RELAX NG for more discussion on the thinking underlying the design of RELAX NG. Finally, I have to disagree with the idea that you shouldn't validate what you receive. You should validate, but you need to carefully choose the schema against which you validate. When you design a language, you need to think about how future versions can evolve. When you specify a particular version of a language, you should precisely specify not just what is allowed by that version, but also what may be allowed by future versions, and how an implementation of this version should process things that may be added by future versions. An implementation should accept anything that the specification says is allowed by this version or maybe allowed by a future version, and should reject anything else. The easiest and most reliable way to achieve this is by expressing the constraints of the specification in machine-readable form, as a schema in a suitable schema language, and then using a validator to enforce those constraints. I believe that the right kind of validation can make interoperability over time more robust than the alternative, simpler approach of having an implementation just ignore anything that it doesn't need.

Validation enables mandatory extensions. Occasionally you want recipients to reject a document if they don't understand a particular extension, perhaps because the extension critically modifies the semantics of an existing feature. This is what the SOAP mustUnderstand attribute is all about.
Validation by servers reduces the problems caused by broken clients. Implementations accepting random junk leads inexorably to other implementations generating random junk. If you have a server that ignores anything it doesn't need, then deploying a new version of the server that adds support for additional features can break existing clients. Of course, if a language has a very unconstrained evolution policy, then validation won't be able to detect many client errors. However, by making appropriate use of XML namespaces, I believe it's possible to design language evolution policies that are both loose enough not to unduly constrain future versions and strict enough that a useful proportion of client errors can be detected. I think Atom is a good example.

2007-04-09

XML and JSON

I've had some useful feedback on my previous post. I need to take a few days to get clearer in my own mind what exactly it is that I'm trying to achieve and find a crisper way to describe it. In the meantime, I would like to offer a few thoughts about XML and JSON. My previous post came off much too dismissive of JSON. I actually think that JSON does have real value. Some people focus on the ability for browsers to serialize/deserialize JSON natively. This makes JSON an attractive choice for AJAX applications, and I think this has been an important factor in jump starting JSON adoption. But in the longer term, I think there are two other aspects of JSON that are more valuable.

JSON is really, really simple, and yet it's expressive enough for many applications. When XML 1.0 came out it represented a major simplification relative to what it aspired to replace (SGML). But over the years, complexity has accumulated, and there's been very little attention given to simplifying and refactoring the XML stack. The result is frankly a mess. For example, it's nonsensical to have DTD defaulting of attributes based on prefix rather than namespace name, yet this is a feature that any conforming XML parser has to implement. It's not surprising that XML is unappealing to a generation of programmers who are coming to it fresh, without making allowances for how it got to be this way. When you look at the bang for the buck provided by XML and compare it with JSON, XML does not look good. The hard question is whether there's anything the XML community can do to improve things that can overcome the inertia of XML's huge deployed base. The XML 1.1 experience is not encouraging.
The data model underlying JSON (atomic datatypes, objects/maps, arrays/lists) is a much more natural model for data than an XML infoset. If you're working in a scripting language and read in some JSON, you directly get something that's quite pleasant to work with; if you read in XML, you typically get some DOM-like structure, which is painful to work with (although a bit of XPath can ease the pain), or you have to apply some complex data-binding machinery.

However, I don't think JSON will or should relegate XML to a document-only technology.

You can't partition the world of information neatly into documents and data. There are many, many cases where information intended for machine-processing has parts which are intended for human consumption. GData is a great example. The GData APIs handle this in JSON by having strings with HTML/XML content. However, I think the ability of XML to handle documents and data in a uniform way is a big advantage for information of this type.
XML's massive installed base gives it an interoperability advantage over any competitive technology. (Unfortunately this also applies to any future cleaned up version of XML.) You don't get the level of adoption that XML has achieved without cost. It requires multiple communities with different objectives to come together and compromise; each community ends up accepting features that are unnecessary cruft from its point of view. This level of adoption also takes time and requires a technology to grow to support new requirements. Adding new features while preserving backwards compatibility often results in a less than elegant design.
A range of powerful supporting technologies have been developed for XML. Naturally I have a fondness for the ones that I had a role in developing: XPath, XSLT, RELAX NG. I also can see a lot of value in XPath2, XSLT2 and XQuery. On some days, if I'm in a particularly good mood and I try really hard, I can see value in XSD. Programming languages are more and more acquiring built-in support for XML. Collectively I think these technologies give XML a huge advantage.
JSON's primitive datatype support is weak. The semantics of non-integer numbers are unspecified. In XSD terms, are they float, double, decimal or precisionDecimal? Some important datatypes are missing. In particular, I think support for binary data (XSD base64Binary or hexBinary) is critical. Furthermore, the set of primitive datatypes is not extensible. The result is that JSON strings end up being used to encode data that is not logically a string. JSON solves the datatyping problem only to the extent the only non-string datatypes you care about are booleans and integers.
JSON does not have anything like XML Namespaces. There are probably many people who see this as an advantage for JSON and certainly XML Namespaces come in for a lot of criticism. However, I'm convinced that the distributed extensibility provided by XML Namespaces is indispensable for a Web-scale data interchange technology. The JSON approach of just ignoring keys you don't understand can get you a long way, but I don't think it scales.

2007-04-06

Do we need a new kind of schema language?

What's the problem?

I see the real pain-point for distributed computing at the moment as not the messaging framework but the handling of the payload. A successful distributed computing platform needs

a payload format
a way to express a contract that a payload must meet
a way to process a payload that may conform to one or more contracts

that is

suitable for average, relatively low-skill programmers, and
allows for loose coupling (version evolution, extensibility, suitability for a wide variety of implementation technologies).

For the payload format, XML has to be the mainstay, not because it's technically wonderful, but because of the extraordinary breadth of adoption that it has succeeded in achieving. This is where the JSON (or YAML) folks are really missing the point by proudly pointing to the technical advantages of their format: any damn fool could produce a better data format than XML.

We also have to live in a world where XSD is currently dominant as the wire-format for the contract (thank you, W3C, Microsoft and IBM).

But I think it's fairly obvious that current XML/XSD databinding technologies have major weaknesses when considered as a solution to problem of payload processing for a distributed computing platform. The two basic databinding techniques I see today are:

Generating XSD from an implementation in a statically typed language which includes optional annotations; this provides a great developer experience, but from a coupling perspective doesn't seem much of an improvement beyond CORBA or DCOM. The other problem is that it's tough to do this in a dynamically typed language (absent sophisticated type inference or mandatory annotations).
Generating programming language stubs from an XSD which includes optional annotations. This is problematic from the developer experience point of view: there's a mismatch between XML's fundamental structures, attributes and elements, which are optimized for imposing structure on text, and the terms in which developers naturally think of data structures. Beyond this inherent problem, it's hard to author schemas using XSD and even harder to author schemas that have the right loose-coupling properties. And the tooling often introduces additional coupling problems.

This pain is experienced most sharply at the moment in the SOAP world, because the big commercial players have made a serious investment in trying to produce tools that work for the average developer. But I believe the REST world has basically the same problem: it's not really feeling the pain at the moment because REST solutions are mostly created by relatively elite developers who are comfortable dealing with XML directly.

The REST world also takes a less XML-centric view of the world, but for non-XML payload formats (JSON, or property-value pairs) its only solution to the contract problem is a MIME type, which I think is totally insufficient as a contract mechanism for enterprise-quality distributed computing. For example, it's not enough to say "accessing this URI will give you JSON"; there needs to be a description of the structure of the JSON, and that description needs to be machine readable.

Some people propose solving the XML-processing problem by adopting an XML-centric processing model, for which the leading technologies are XQuery and XSLT2. The fundamental problem here is the XQuery/XPath data model. I'm not criticizing the WGs' efforts: they've done about as good a job as could be done given the constraints they were working under. But there is no way it can overcome the constraint that a data model based around XML and XSD is just not very good data model for general-purpose computing. The structures of XML (attributes, elements and text) are those of SGML and these come from the world of markup. Considered as general purpose data structures, they suck pretty badly. There's a fundamental lack of composability. Why do we need both elements and attributes? Why can't attributes contain elements? Why is the type of thing that can occur as the content of an element not the same as the type of thing that can occur as a document? Why do we still have cruft like processing instructions and DTDs? XSD makes a (misguided in my view) attempt to add a OO/programming language veneer on top. But it can't solve the basic problems, and, in my view, this veneer ends up making things worse not better.

I think there's some real progress being made in the programming language world. In particular I would single out Microsoft's LINQ work. My doubts on this are with its emphasis on static typing. While I think static typing is a invaluable within a single, controlled system, I think for a distributed system the costs in terms of tight coupling often outweigh the benefits. I believe this is less of the case if the typing is structural rather than nominal. But although LINQ (or at least newer versions of C#) have introduced some welcome structural typing features, nominal typing is still thoroughly dominant.

In the Java world, there's been a depressing lack of innovation at the language level from Sun; outside of Sun, I would single out Scala from EPFL (which can run on a JVM). This adds some nice functional features which are smoothly integrated with Java-ish OO features. XML is fundamentally not OO: XML is all about separating data from processing, whereas OO is all about combining data and processing. Functional programming is a much better fit for XML: the problem is making it usable by the average programmer, for whom the functional programming mindset is very foreign.

A possible solution?

This brings me to the main point I want to make in this post. There seems to me to be another approach for improving things in this area, which I haven't seen being proposed (maybe I just haven't looked in the right places). The basic idea is to have a schema language that operates at a different semantic level. In the following description I'll call this language TEDI (Type Expressions for Data Interchange, pronounced "Teddy"). This idea is very much at the half-baked stage at the moment. I don't claim to have fully thought it through yet.

If you look at the major scripting languages today, I think it's striking that at a very high level, their data structures are pretty similar and are composed from:

arrays
maps
scalars/primitives or whatever you want to call them

This goes for Perl, Python, Ruby, Javascript, AWK. (PHP's array datastructure is a little idiosyncratic.) The SOAP data model is also not dissimilar.

When you drill down into the details, there are of course lots of differences:

some languages have fixed-length tuples as well as variable-length arrays
most languages distinguish between a struct that has a fixed set of identifiers as keys and a map that can have an unlimited set keys (though there are often restrictions on the types of keys, for example, to prohibit mutable types)
there's a wide variety of primitives: almost all languages have strings (though they differ in whether they are mutable) and numbers; beyond that, many languages have booleans, a null value, some sort of date-time support

TEDI would be defined in terms of a generic data model that makes a tasteful restricted choice from these programming languages' data structures: not limiting the choice to the lowest common denominator, but leaving our frills and focusing on the basics and on things that be naturally mapped into each language. At least initially, I think I would restrict TEDI to trees rather than handle general graphs. Although graphs are important, I think the success of JSON shows that trees are good enough as a programmer-friendly data interchange mechanism.

I would envisage both an XML and a non-XML syntax for TEDI. The non-XML syntax might have JSON flavour. For example, a schema might look like this:

   { url: String, width: Integer?, height: Integer?, title: String? }

This would specify a struct with 4 keys: the value of the "url" key is a string; the value of the "width" key is a string or null. You can thus think of the schema as being a type expression for a generic scripting language data structure.

The key design goal for TEDI something would be to make it easy and natural for a scripting-language programmer to work with.

There's one other big piece that's needed to make TEDI work: annotations. Each component of a TEDI schema can have multiple, independent annotations, which may be inline or externally attached in some way. Each annotation has a prefix that identifies a binding. A TEDI binding specification has to be developed for each programming language and each serialization that will be used with TEDI.

The most important TEDI binding specification would be the one for XML. This specifies for a combination of a

a TEDI schema,
XML binding annotations for the TEDI schema, and
an instance of the generic TEDI data model conforming to the schema

which XML infosets are considered correct representations of the instance, and also identifies one of these infosets as the canonical representation. The XML binding annotations should always be optional: there should be a default XML serialization of any TEDI instance.

For example, an instance of the example schema above might get serialized as

<root>
<url>http://www.example.com/pic.jpg</url>
<title>A fine picture</title>
</root>

But with an annotation

  @xml.element(name="picture")
{ url: String, width: Integer?, height: Integer?, title: String? }

it might get serialized as

<picture>
<url>http://www.example.com/pic.jpg</url>
<title>A fine picture</title>
</picture>

Let's try and make this more concrete by imagining what it would look like for a particular scripting language, say Python. First of all people in the Python community would need to get together to create a TEDI binding for Python. This would work in an analogous way to the XML binding. It would specify for a combination of a

a TEDI schema,
Python binding annotations for the TEDI schema, and
an instance of the generic TEDI data model conforming to the schema

which Python data structures are considered representations of the instance, and also identify one of these data structures as the canonical representation.

The API would be very simple. You would have a TEDI module that provided functions to create schema objects in various ways. The simplest way would be to create it from a string containing the non-XML representation of the TEDI schema complete with any inline annotations Any XML and Python annotations would be used; annotations from other bindings would be ignored. The schema object would provide two fundamental operations:

loadXML: this takes XML and returns a Python structure, throwing an exception if the XML is not valid according to the TEDI schema
saveXML: this take a Python structure and returns/outputs XML, throwing an exception if the Python structure is not valid according to the schema

XML is not the only possible serialization. The JSON community could develop a JSON binding. If you implemented that, then your API would have loadJSON and saveJSON methods as well.

One complication that must be handled in order to make this industrial-strength is streaming. A good first step would be to able to handle the pattern where the document element contains zero or more header elements, and then a possibly very large number of entry elements, each of which is not large; the streaming solution you would want in this case is for the API to deliver the entries as an iterator rather than an array.

Another challenge in designing the TEDI XML binding is handling extensibility. I think the key here is for one of the TEDI primitives to be an XmlElement (or maybe XmlContent). (This might also be useful in dealing with XML mixed content.) With different TEDI schemas you should be able to get quite different representations out of the same XML document. For a SOAP message, you might have a very generic TEDI schema that represents it as an array of headers and a payload (all being XmlElements); or you might have a TEDI schema for a specific type of message that represented the payload as a particular kind of structure.

This shows how you could fit TEDI into a world where XML is the dominant wire format, but still leverage other more suitable wire formats when appropriate.

But how do you interop with a world that uses XSD as the wire format for contracts? The minimum is to create a tool that can take a TEDI schema with XML annotations and generate an XSD. There'll be limits because of the limited power of XSD (and these will need to be taken into consideration in designing the TEDI XML binding): some of the constraints of the TEDI schema might not be captured by the XSD. But that's a normal situation: there are often complex constraints on an XML document being interchanged that cannot be expressed in XSD.

A more difficult task is to take an XSD and generate a TEDI together with XML binding annotations. This would be one of the main things that would drive adding complexity to the TEDI XML binding annotations. I expect that the work of the XML Schema Patterns for Databinding WG would be valuable input on what was really needed.

In the future, there's still hope that the wire-format for the contract need not always be XSD: WSDL 2.0 makes a significant effort not to restrict itself to XSD; so you could potentially publish a WSDL with both the XSD and the TEDI for a web service.

The closest thing I've seen to TEDI is Paul Prescod's XBind language, but it has a rather different philosophy in that it separates validation from data binding, whereas TEDI integrates them. Another difference is that Paul has written some code, whereas TEDI is completely vaporware at this point.

I'm going to use subsequent posts to try to develop the design of TEDI to the point where it could be implemented; at the moment it's not developed enough to know whether it really holds water. If you find the idea interesting, please help with the design process by using comments to give feedback. I promise to try to keep future posts shorter, but I wanted my first real post to have a bit of meat to it.

2007-03-01

Disclosure/disclaimer

My first post is rather a boring one. Some of the posts in this blog will be about Web Services. I feel I should therefore disclose that I have an investment in and am a board member of WSO2, which is a startup centered on open source from the Apache Web Services project. The opinions in this blog are mine and should not be ascribed to WSO2.