James Clark's Random Thoughts: 2009

2009-03-23

Getting involved with M

I spent last week in Redmond talking to Microsoft about M and Oslo. The question at the back of my mind before I went was "Does M really have the potential in the long term to be an interesting and useful alternative to XML?". My tentative answer is yes. Here's why:

Although M, as it is today, is interesting in a number of ways, it is obviously a long way from being a serious alternative to XML (at least for the kinds of application I am interested in). One of my concerns was that I would hear "It's too late to change that". I never did: I was pleasantly surprised that Microsoft are still willing to make fundamental changes to M.
Microsoft recognize that M's long-term potential would be severely limited if it was a proprietary, Microsoft-only technology. I believe they realize that M needs to end up as a genuinely open standard. They've already made initial steps towards a more open process for M. On the other hand, they don't believe in design by committee. (And having seen some of the abominations that design by committee can produce, I can certainly sympathise with that.) There's a senior Microsoft guy that gets to make the final call on all design decisions. In other words, it's a benevolent dictator model. I'm OK with this in principle (although I like it even better when I'm the benevolent dictator). I think it's worked really well in a number of cases (C# and Python spring to mind). But obviously it all depends on the qualities of the particular benevolent dictator. From my interactions so far, he seems like a really smart guy and he's willing to listen.
Microsoft is addressing the whole stack. An alternative to XML needs to provide not only an alternative to XML itself but also alternatives to XSD/RELAX NG and XQuery/XSLT.
Microsoft seem to be designing things in a principled way; they are paying attention to the relevant CS theory. For example, ML seems to be a major influence. They are making an effort to produce something clean, elegant, even beautiful, rather than doing just enough to get a product out.
Microsoft seem willing to take documents seriously. This is a make or break issue for me, because the kind of data I care about most is documents and M, as it is today, is not useful for documents. This was probably the issue we spent the most time on; I talked a lot about the importance of mixed content. One of the Microsoft team suggested the goal of using M to do the M specification. I think this sort of dogfooding will be very helpful in ensuring M works well for documents.

Of course, it's early days yet, and it's hard to tell how much leverage M will get, but there's enough potential to make me want to be involved.

2009-01-17

RELAX NG and xml:id

One part of the vision underlying RELAX NG is that validation should not be monolithic: it is not necessary or desirable to have one schema language that can handle every possible kind of validation you might want to do; it is better instead to have multiple specialized languages, each of which does one kind validation, really well. Consistent with this vision, RELAX NG provides only grammar-based validation. There's no implicit claim that other kinds of validation aren't useful and important.

One kind of validation that is clearly useful and important and that can't be done by grammars is checking of cross-references. One possibility is to use Schematron for this. The designers of RELAX NG anticipated that there would be a little schema language specialized to this, which would be created as part of the ISO DSDL effort (as part 6); this wouldn't be a million miles from the kind of thing that XSD provides with xs:key/xs:unique/xs:keyref. Unfortunately this hasn't happened yet.

Since DTDs provide ID/IDREF checking and we wanted people to be able to move easily from DTDs to RELAX NG, we felt we had to provide some transitional support for ID/IDREF checking while awaiting the ultimate "right" solution. We therefore provided a separate, optional spec called RELAX NG DTD Compatibility. Amongst other things, this defines a way in which RELAX NG processors can optionally provide DTD-compatible ID/IDREF checking based on the datatypes of attributes declared in the schema. Note that this can't handled by the XSD datatypes library for RELAX NG, because assignment of types in the schema to values in the instance is not part of the RELAX NG model of validation.

When defining RELAX NG DTD compatibility, we took a fairly hard line about being DTD-compatible. In particular, we made it a requirement that you should be able to generate a DTD subset from the RELAX NG schema that would perform the same type assignment that the process defined by the spec would perform. This creates some problems when you use DTD Compatibility in conjunction with wildcards (which of course aren't a DTD feature). For example:

start = element doc { p* }
p = element p { id?, any* }
id = attribute id { xsd:ID }
any = element * { attribute * { text }*, (any|text)* }

will get a error about conflicting ID-types for p/@id. This is because the schema allows <p> to contain a <p> element with an id attribute that doesn't have type ID. Instead you would have to write:

start = element doc { p* }
p = element p { id?, any* }
id = attribute id { xsd:ID }
any = element * - p { attribute * { text }*, (any|text)* }

Several years after the DTD compatibility spec was finished, the W3C came out with the xml:id Recommendation. The spec mentions RELAX NG in a non-normative appendix and encourages authors "to declare attributes named xml:id with the type xs:ID". Now on the face of it, this seems pretty reasonable advice. Unfortunately, from the point of the RELAX NG DTD Compatibility spec it's precisely the wrong thing to do. For example, this

start = element doc { p* }
p = element p { id?, any* }
id = attribute xml:id { xsd:NCName }
any = element * { attribute * { text}*, (any|text)* }

will work perfectly with RELAX NG with or without DTD compatibility. The XML processor does the xml:id checking, and RELAX NG can ignore ID/IDREFs. But if instead you follow the xml:id Recommendation's suggestion and do:

start = element doc { p* }
p = element p { id?, any* }
id = attribute xml:id { xsd:ID }
any = element * { attribute * { text}*, (any|text)* }

a RELAX NG validator that implements RELAX NG DTD compatibility will give you an error about conflicting ID-types p/@xml:id. You might think you could do

start = element doc { p* }
p = element p { id?, any* }
id = attribute xml:id { xsd:ID }
any = element * { attribute * - xml:id { text}*, id?, (any|text)* }

but that won't work either, because although you can now write a DTD subset that does equivalent type assignment for p, you can't do it for the other elements.

(The xml:id Recommendation also says in the RELAX NG section that "A document that uses xml:id attributes that have a declared type other than xs:ID will always generate xml:id errors.". I don't see why: the xml:id processor is quite likely to be part of the XML parser, which doesn't know anything about RELAX NG, nor does RELAX NG know anything about xml:id.)

Back when RELAX NG DTD compatibility spec came out, I implemented support for the ID/IDREF checking part of DTD Compatibility in Jing. I also decided to make Jing enforce this by default. There's a -i switch to turn it off. Before xml:id came along, this seemed to work OK: if a schema author specifies ID/IDREF in a RELAX NG schema then they usually want ID/IDREFs to be checked and RELAX NG DTD Compatibility was the only thing that could do this checking. With xml:id this no longer works well: if you

use xml:id
declare xml:id attributes as type xsd:ID in the RELAX NG schema
use wildcards in your RELAX NG schema
don't use any special options to Jing

you are very likely to get an error from Jing.

At first, my plan was simply to change Jing not to enforce DTD Compatibility by default. However, Alex Brown pointed out that this isn't completely satisfactory: people who are coming from DTDs and aren't using xml:id lose the sensible ID/IDREF checking that they might reasonably expect to happen by default. So now I'm thinking that a better solution might be to add two boolean options to Jing, both of which would be enabled by default.

The first option would be to make it a warning rather than an error if the schema does not use ID/IDREF in a DTD-compatible way. (If the schema is DTD-compatible, then duplicate IDs or IDREFs to non-existent IDs would still be errors.)

The second option would tell Jing to be "xml:id aware". This would have several effects.

It would require attributes named xml:id to be declared with type xsd:ID (or with the ID type from the datatype library defined by the DTD compatibility spec). This isn't strictly necessarily, but it would seem to minimize confusion and be in keeping with the spirit of the xml:id Recommendation. It's slightly tricky to decide what this means with various unusual RELAX NG wildcards. It is obvious that attribute xml:id { text} is an error. But the following are not all obvious to me:
- attribute xml:id|id { text }
- attribute * { text }
- attribute xml:* { text }
- attribute *|xml:id { text }
When checking whether you can generate an equivalent DTD subset, xml:id attributes would be ignored. In the terms defined by the RELAX NG DTD Compatibility spec, you would ignore xml:id attributes when determining whether the schema is compatible with the ID/IDREF feature.
When checking uniquess of IDs, and when checking IDREFs, an attribute named xml:id would always be treated as an ID attribute.

It might also be a good idea to revise the RELAX NG DTD compatibility spec to be xml:id aware in this way.

2009-01-12

My new laptop

I just bought a new laptop. Previously I was using a Sony VAIO TZ, specifically a TZ38GN:

1.15kg
11.1" screen, 1366x768 resolution, LED backlit
U7700 (1.33GHz) CPU
2Gb RAM
48Gb solid state disk
DVD writer

I bought this during a period where I was doing mostly email and web browsing and no coding, and it worked beautifully for that. The screen was fantastic, and it was the perfect size.

However, when I started doing some coding, I began to find it a little bit wimpy: CPU a bit slow, not quite enough RAM, disk too small. Also when I'm at home, I like to use my laptop with an external keyboard and a large 24" (1920x1200) display, and the TZ would only go up to 1680x1050.

I also have an old Dell Precision M65, which I bought nearly two years ago:

about 3kg
15.4" screen, 1920x1200 screen, not LED backlit
T7600 (2.33GHz) CPU
4Gb RAM (but the chipset only allowed 3.25Gb to be used)
160Gb 7200rpm disk
DVD writer

This has enough horsepower for coding, but after getting used to the TZ, I found lugging the Dell around to be a real pain, and the screen is much less nice than the VAIO's.

I ended up using the Dell at my desk, and the TZ elsewhere. But having things divided between two machines started to be a real pain, so I wanted to find something with the power of the Dell (at least when connected to a monitor) but the size and weight of the TZ.

I ended up choosing a Sony VAIO Z, specifically a VGN-Z26SN:

1.46kg
13.1" screen, 1600x900 resolution, LED backlit
P8600 (2.4GHz) CPU
3Gb RAM
320Gb 7200rpm disk
DVD writer

They had another model (the Z27), which was 25% more expensive, and offered a 2.53Ghz CPU, 4Gb RAM and a Blu-ray drive. I suspected I wouldn't be able to use much of the extra 1Gb RAM, and I didn't have much use for the Blu-ray drive, but the deciding factor was that, bizarrely, the Z27 didn't come with a proper Thai keyboard.

The VAIO Z cost me about 80,000 baht (about US$2250 at today's exchange rate), which is not as unreasonable as VAIOs usually are. By comparison, I think the TZ cost me about 100,000 baht (mainly because of the SSD), and the Dell was about 180,000 baht (it was a very high-end machine at the time).

Overall I'm reasonably happy with it. The screen is great and the battery life is OK, although not as good as the TZ's. The keyboard is OK, but I wish it was backlit. Also it doesn't have separate Home/End/PgUp/PgDn keys: you have to press the Fn key in conjunction with the arrow keys. It only has two USB ports, which means you will probably need a USB hub. I discovered one incredibly annoying feature: although the CPU has virtualization support, the BIOS prevents you from enabling it. I can understand Sony's not supporting virtualization in the sense of not providing support if you have problems when you use it, but it's hard to accept a policy that actively prevents a customer making use of an important feature of the hardware they have bought.

I am afraid I don't have any useful information about how well Linux runs on it, because I have been using Vista. This might seem like a strange thing for me to do. I'll explain in a separate post.

James Clark's Random Thoughts

2009-03-23

Getting involved with M

2009-01-17

RELAX NG and xml:id

2009-01-12

My new laptop

Labels

Blog Archive

James Clark's Random Thoughts

2009-03-23

Getting involved with M

2009-01-17

RELAX NG and xml:id

2009-01-12

My new laptop

Subscribe To

Labels

Blog Archive