tag:blogger.com,1999:blog-3944976411672994427.post4828994358429784181..comments2023-09-03T17:53:38.313+07:00Comments on James Clark's Random Thoughts: MicroXMLJames Clarkhttp://www.blogger.com/profile/04798042939786677843noreply@blogger.comBlogger22125tag:blogger.com,1999:blog-3944976411672994427.post-56023509474557825312015-02-01T02:29:07.462+07:002015-02-01T02:29:07.462+07:00Isn't this just a case of change for changes s...Isn't this just a case of change for changes sake? I concur with other comments that JSON does the job and is user friendlyDaveHhttp://www.deehoseo.comnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-17771385709734087852012-06-23T16:38:32.367+07:002012-06-23T16:38:32.367+07:00Hi James,
Sorry I haven't followed MicroXML t...Hi James,<br /><br />Sorry I haven't followed MicroXML too much till now.<br /><br />I've recently been made aware that XML is not a hypermedia format - that is, has no hypermedia affordances. <br /><br />If MicroXML had those, and they were backward compatible with XML, maybe it would solve some issues on the web.<br /><br />Thanks for thinking about this.<br /><br />PeterPeter Rushforthhttps://www.blogger.com/profile/09472639836847800891noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-10147531664947168032011-02-07T20:25:32.432+07:002011-02-07T20:25:32.432+07:00You should mention Tim Bray's XML-SW from 2002...You should mention Tim Bray's <a href="http://www.textuality.com/xml/xmlSW.html" rel="nofollow">XML-SW</a> from 2002. Back then some people agreed but nobody cared, why should it be different today. Either you try to live with XML as great and broken as it is, or you choose some other language. Before XML it was ASN.1 and SGML, yesterday it was XML, today it is JSON and RDF, and the day after tomorrow something else. Every new language promises a revolution, but sooner or later it always evolves into a complex monster and other languages are proposed. This is just evolution.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-30297028863644917062010-12-24T02:51:57.181+07:002010-12-24T02:51:57.181+07:00ormaaj: MicroXML has just as much extensibility at...ormaaj: MicroXML has just as much extensibility at the instance level as XML 1.x does.<br /><br />jjc: Okay, I'm convinced by the serialization argument to allow > in attribute values.<br /><br />Uche: I think MicroXML as outlined is just as suitable for documents as for data, and JSON is too simple for some data. In any case, a suitable mapping between the two would be an unambiguous Good Thing.<br /><br />Anonymous: The cost of full XML is not to document authors, who can indeed leave out any feature they don't need, but to library, tool, and application programmers, who cannot. There is also a cost in understanding. XML can't do SUBDOCs as SGML could, but the XML definition is something like 10% the size of the SGML definition. I doubt if MicroXML can be defined in five pages, but it would be interesting to try.John Cowanhttps://www.blogger.com/profile/11452247999156925669noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-58332834260236295892010-12-23T10:56:49.921+07:002010-12-23T10:56:49.921+07:00I pretty much agree with Anne Kesteren's blog ...I pretty much agree with Anne Kesteren's <a href="http://annevankesteren.nl/2010/12/why-microxml" rel="nofollow">blog post</a> - if it ain't broke, don't fix it. What problem is solved by defining a subset that does nothing but restrict functionality? Nobody said you have to use or even implement every obscure extension. Document authors are already free to use whatever subset they choose. Of course XML is going to be huge and complicated; it's <i>eXtensible</i>. In a nutshell, I'm all for relaxing hard requirements, depreciating cruft conservatively, but very against imposing limitations on the existence of features that do no harm. The solution is to keep the core "Must" parts of the language small, but should avoid "Must Nots", which is what MicroXML seems to be adding a whole lot of.<br /><br />The only criticism of XML for use in document markup that holds any water is the mandatory well-formedness behavior. Everybody knows that in contexts where the point isn't to forbid the user from accessing their data, the correct behavior would be to warn and optionally try doing something sensible to display what's available. None of that is any reason to throw the baby out with the bath water and would be easy to fix from within XML without breaking applications where strict parsing is critical as a parity check.<br /><br />Namespaces aren't confusing or unusual. People are used to "import x as y" in Python and Haskell; I think we can handle namespaces. They are an important mechanism to extensibility when delimiting multiple languages. The wrong thing to do would be to rip it out and replace it with another incompatible mechanism which does the exact same thing.<br /><br />There's also not much point in forcing people not to use doctypes in the XML spec. The more serious problem is that all versions of XHTML up until the most recent working drafts of XHTML+RDFa mandate doctype use for some reason. Being required to at least extend or modify a schema language without namespace support (DTD) in addition to possibly some other more sane language you actually want to use for validation sort of defeats the purpose of modularization as a way of easing customization and extensibility.<br /><br />Remove processing instructions? Again, if you don't like them, don't use them. I think the addition of <a href="http://www.w3.org/XML/2010/01/xml-model/" rel="nofollow"><?xml-model?></a> might help the aforementioned modularization problems similarly to <a href="http://www.nvdl.org/" rel="nofollow">NVDL</a>.ormaajhttps://www.blogger.com/profile/10507650859831408779noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-15133630665460863312010-12-22T22:14:06.967+07:002010-12-22T22:14:06.967+07:00It is hard not to sympathize with every item on th...It is hard not to sympathize with every item on the list as well as every comment, having experienced these pains and more at one point or another.<br /><br />But the question is, can this sort of effort become usable? At heart I was always really a fan of the whole XML spec, so "This is just irritating. Please leave XML as it is today." feels at least as valid as other comments.<br /><br />If you are going to seriously propose a subset of XML with use that is wide-spread enough to be significant, you need to avoid the attitude that people can choose to like your own particular subset, or just live with XML classic as it is today. Or at that point, you are just another person defining a private convenient subset that will not reduce the pain.<br /><br />You have to make this new subset its own normative/canonical XML such that that anyone can reasonably subscribe to with a way of mapping virtually anything down to that level, including even such things as:<br /><br />Starting with the most unpopular thing that I use frequently: Parameter/general/external entities. The only thing bad about them is the non-standard syntax that seems to make people hate them and the way they are inflexibly married into the processing model. I like what Java communities do as an alternative with ${foo} style substitutions, but it needs to be integrated before the validation is applied and spoiled by unsubstituted values, and this sort of thing needs to be standardized or entities continue to look attractive for this sort of operation. XInclude for external entities is fine if I can get them when I need them, but I have yet to see them supported anywhere I need them, whereas external entities are usually everywhere I need them, i.e. breaking up a monolithic XML configuration file of an arbitrary program without having to rewrite the program.<br /><br />Namespaces: I don't see how you avoid in-document prefix declarations, but as long as they are kept confined to the root element, are they so bad? It is not uncommon to have a schema that is a combination of specific parts from one namespace and common parts from at least one common library schema. Do I really have to register my organization's common prefix so that I can draw some elements from a common set in a separate module using a prefix, as opposed to having to set a new default xmlns every time I use it or resort to reverse domain verbosity every time I want to draw from the new namespace?<br /><br />What is so harmful about processing instructions as long as those who do not use them are free to ignore them? It is great that most processors DO ignore them as comments. Languages gain this sort of declarative feature (i.e. Java annotations or XML Schema annotations), because it feels silly to start putting this sort of information into comments everywhere (like in C++) and make comments part of the processing model. How do you get rid of the feature without resolving the need? Is it just the syntax you object to? Are you going to provide us with something to accomplish the same thing, i.e. a standard prefix that is tolerated or ignored by most validators that only want to enforce real structure?<br /><br />As much as we all tend to hate attribute normalization, who puts things into attributes these days that suffer under normalization? It may make as much sense to eliminate attributes altogether. Why do we need this duality of elements and attributes without any commonly adopted standard of when to use one or the other? Only because of mistakes made in the definition of child elements such as not supporting a shortened form for marking the end of content (I think SGML may have had which is a step in the right direction) and the difficulty of distinguishing (unordered) structural content from sequential content, but those are problems anyway when child content occurs that for other reasons does not fit well into attributes.<br /><br />Can you define the exact motivation which justifies your decisions and the mapping from more general XML use cases, instead of just for the common good of some unspecified group, without any detail.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-23610330623531226062010-12-22T21:21:26.236+07:002010-12-22T21:21:26.236+07:00In the spirit of your decision on namespaces, why ...In the spirit of your decision on namespaces, why not forego doctypes altogether, and simply have each element specify it's type metadata with standard attributes (xmlns, schema/doctype, etc)?<br /><br />I think it would be beneficial in the sense that this would allow a document to specify certain "chunks" of itself as specific types of metadata, thus making it easier to stitch XML fragments together into larger documents. It would also simplify the parser by removing a special case, at the possible cost of breaking backwards compatibility if you remove the ability to start an element's name with an exclamation mark.<br /><br />I mean, if you are going to toss away the concept of strict parsing/error-handling, then there's no need to specify a doctype upfront. You are really only using it to provide metadata that's not used for parsing, but rather for validation or to imply intent.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-10004268321131431762010-12-18T09:41:10.421+07:002010-12-18T09:41:10.421+07:00@John Cowan
I agree with most of your comments, bu...@John Cowan<br />I agree with most of your comments, but I'm not convinced about disallowing > in attributes.<br /><br />One reason is compatibility with existing XML serializers. My guess is that most existing XML serializers, given an XML document that could be serialized as MicroXML, will create well-formed MicroXML. I suspect disallowing > in attributes would break this. At least this is true of the serializers I've written.<br /><br />Another reason is compatibility with canonical XML. At the moment any canonical XML document is well-formed MicroXML, provided it's infoset is representable as MicroXML. Canonical XML requires the use of > in attributes.James Clarkhttps://www.blogger.com/profile/04798042939786677843noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-84331822810945903812010-12-18T07:52:07.008+07:002010-12-18T07:52:07.008+07:00@uche
I agree. I don't see anything data-orien...@uche<br />I agree. I don't see anything data-oriented about MicroXML. The niche I see for MicroXML is a simple format for documents. As you say, JSON already makes a nice, simple format for data.James Clarkhttps://www.blogger.com/profile/04798042939786677843noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-11869156182626985802010-12-18T03:01:28.735+07:002010-12-18T03:01:28.735+07:00Oh dear. Re: what AngleBracket said said, I'm...Oh dear. Re: what AngleBracket said said, I'm not interested in a data XML. I use JSON for that. I'm also a doc head, and I think XML simplification can be suitable for doc heads as well. But if data-orientation becomes an explicit goal, I worry about the effect on MicroXML, at least from my perspective.uchehttp://copia.ogbuji.netnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-61432923899439049742010-12-17T07:13:41.839+07:002010-12-17T07:13:41.839+07:00We've had quite a bit of success with XForms i...We've had quite a bit of success with XForms implementations in the browser by using XHTML+XForms markup to provide MVC (data binding, logic layer, presentation layer, submission, events, etc) and then in-browser implementations such as JavaScript DOM walking (Ubiquity XForms) or in-browser XSLT PI (AgenceXML XSLTForms). The run-time then leverages the browser components (JavaScript, HTML4, HTML5, SVG, what-have-you) but gives you a clean model-based approach.<br /><br />Right now this relies on namespaces to separate the vocabularies of XForms and XHTML; either prefixes (xf:input, xf:submission) or default ns (...).<br /><br />It'd be nice to have a way of authoring mixed-vocabulary documents where this technical approach to converting markup into working code still works. I'm not stuck on namespaces, but concerned that the extensibility I see in HTML5 is limited to "Oh, we like SVG so it's OK". The DTD-based approach outlined here seems to enshrine that conviction further.<br /><br />Leigh Klotz<br />Co-Chair W3C Formd Working GroupLeighhttps://www.blogger.com/profile/11100725740428041485noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-48536472960468574662010-12-17T07:11:35.180+07:002010-12-17T07:11:35.180+07:00This is just irritating. Please leave XML as it is...This is just irritating. Please leave XML as it is today.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-54288606587196940392010-12-15T11:26:08.383+07:002010-12-15T11:26:08.383+07:00(Feel free to delete the duplicated comment.)
Ano...(Feel free to delete the duplicated comment.)<br /><br />Another point: #xD should be removed from the definition of whitespace, because newline normalization changes explicit CRs into LFs anyway. The only place in which it matters in XML 1.x is in attribute normalization, which isn't present in MicroXML anyhow.<br /><br />(Personal note: Years ago, when I mentioned to James that #xD never needed to be escaped on output, he showed me this counterexample. This led me to formulate the <a href="http://lists.xml.org/archives/xml-dev/200206/msg00998.html" rel="nofollow">Law of James Clark</a>.)John Cowanhttps://www.blogger.com/profile/11452247999156925669noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-56696818747397076552010-12-15T08:05:29.700+07:002010-12-15T08:05:29.700+07:00Let's call it TinyXML to stay in line with SVG...Let's call it TinyXML to stay in line with SVG. ;)Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-45623217252622300472010-12-15T06:28:24.965+07:002010-12-15T06:28:24.965+07:00> Attribute value normalization...
> "D...> Attribute value normalization...<br />> "Do people really put newlines in attribute <br />> values and rely on their being turned into spaces? <br />> I doubt it."<br /><br />They may not put them in themselves, but pretty much anyone using Emacs/psgml/xxml will find attribute values containing spaces being wrapped to newlines and TABs whether they want it or not. They don't put them there: they get put in for them.<br /><br />But normalization should be a function of the processing (eg XPath normalize-space()), not a function of the spec.<br /><br />As you just posted on c.t.t, this is a data-oriented proposal; as a document-head, I'll carry on using XML 1.x, and just generate MicroXML-or-whatever as and when I need to.<br /><br />///PeterUnknownhttps://www.blogger.com/profile/09501797793656654219noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-72057210667479723302010-12-15T04:26:00.628+07:002010-12-15T04:26:00.628+07:00On paying the price: your SGML parsers and other ...On paying the price: your SGML parsers and other tools paid the price for the C/C++ ecology, but not elsewhere. It was precisely because Perl and Java and C# and Ruby and Smalltalk hackers were willing to write XML parsers that the <i>full</i> price has been paid for XML. That said, I do think the point that "XML needs to go where no *ML has gone before" is a good one.<br /><br />I agree about the separate specification. If there is no objection, I'll discuss this with the XML Core WG, and it might become a work item.<br /><br />I think an empty DOCTYPE declaration should be optional but not part of the data model (the root element name is just leftover cruft from SGML anyhow, where it actually mattered). MicroXML parsers would accept one and verify the name match, and MicroXML generators would generate it or not depending on their parameters, just like single vs. double quotes, whitespace within attributes, etc.<br /><br />If there are no namespace declarations, then there needs to be a round-trip mapping between MicroXML and namespace-well-formed XML. In TagSoup, an undeclared prefix "foo" is currently mapped to the namespace name "urn:x-prefix:foo". It would be straightforward to write an RFC defining the URN scheme "xmlns-prefix" so that we could use "urn:xmlns-prefix:foo".<br /><br />According to the HTML5 FAQ, inherently-empty elements may be written using start-tag syntax or empty-tag syntax. I have to think more about the question of emptiness.<br /><br />I have taken advantage of attribute value normalization for attributes of type list { anyURI }, as an ad hoc folding measure. I agree that this is mostly an edit/display issue, though, and I wouldn't be troubled to see it go.<br /><br />Note that although 50% of the Web's documents are UTF-8, another 20% are ASCII, which are also UTF-8 by definition. So the true breakdown is: UTF-8 70%, ISO 8859-1 and friends 20%, all else 10%. UTF-16 is about 0.02%.<br /><br /><br />For simplicity and uniformity, I would disallow > in attribute values as well as character content.<br /><br />I think there needs to be a mapping to JSON. The simplest approach is just to say that JSON documents are represented in MicroXML using elements named object, array, number, boolean, string, and null. More cleverly, the mapping can say that elements with a json:type attribute of object etc. are mapped to the appropriate JSON values, making JSON an architecture of MicroXML. As a third alternative, use xsi:type and xsi:nil attributes, giving XML Schema access to the JSON types.John Cowanhttps://www.blogger.com/profile/11452247999156925669noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-70155394762021057242010-12-14T01:34:45.782+07:002010-12-14T01:34:45.782+07:00On first impression, I love this proposal. There ...On first impression, I love this proposal. There are plenty of little bits and bobs to discuss further, but overall, it's weighted marvelously. I do want to point out that the Mozilla incremental parse issue was resolved a few years ago, as available in FF 3.5 and up<br /><br />https://bugzilla.mozilla.org/show_bug.cgi?id=18333<br /><br />But your point remains that the fact that this problem persisted so long had already doen the damage.Uche Ogbujihttp://uche.ogbuji.netnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-7237061551547197662010-12-13T23:21:27.067+07:002010-12-13T23:21:27.067+07:00Its great to see this initiative take shape: http:...Its great to see this initiative take shape: http://goo.gl/48w5t.<br /><br />As for what should be in it (such as doctype) I believe that anything that complicates the data model for MicroXML processors needs to bring serious benefit in order to be worth it.<br /><br />For me, the most important attribute of an XML 1.0 subset is simple null-transformation through a simple data model. I.e. can I read it in, and write it back out again in a way that does not result in any surprises on the output side.Sean McGrathhttps://www.blogger.com/profile/17729925642255386855noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-81937786601466005672010-12-13T22:36:46.872+07:002010-12-13T22:36:46.872+07:00@David
For some reason, I thought that either an e...@David<br />For some reason, I thought that either an external or internal subset was required, but checking the XML Rec again, I see that it is allowed to have neither. That definitely swings me in the direction of allowing just this kind of DOCTYPE declaration.<br /><br />One downside is that it complicates the data model. You now need a separate root or document object (unless you require the DOCTYPE declaration).James Clarkhttps://www.blogger.com/profile/04798042939786677843noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-18740235197597117922010-12-13T21:26:32.229+07:002010-12-13T21:26:32.229+07:00Another factor is that almost the only thing that ...<i>Another factor is that almost the only thing that the XML subsets out there agree on is to disallow the DOCTYPE declaration</i><br /><br />They may be phrased that way but mostly they want to get rid of dtd. I would have thought that allowing a doctype with no internal or external subset would be OK and make things easier wrt html5, so<br /><br /><!DOCTYPE foo><br /><foo/><br /><br />would be well formed.David Carlislehttps://www.blogger.com/profile/12909254806316189772noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-29269400143610812072010-12-13T17:00:59.385+07:002010-12-13T17:00:59.385+07:00See also discussion on xml-dev.See also <a href="http://lists.xml.org/archives/xml-dev/201012/msg00490.html" rel="nofollow">discussion on xml-dev</a>.James Clarkhttps://www.blogger.com/profile/04798042939786677843noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-18694714486389422622010-12-13T16:24:18.702+07:002010-12-13T16:24:18.702+07:00Hi James:
I found your blog post interesting. Do ...Hi James:<br /><br />I found your blog post interesting. Do you have any examples of the before and after (my apologies for the allusion to the old commercials that talk about losing weight, or getting rid of wrinkles)? I always thought XML 1.0 was "bloated" and needed simplification. <br /><br />Thanks, my best!<br /><br />William Gilreath<br />http://www.williamgilreath.comWill Gilreathhttps://www.blogger.com/profile/04600306540003095688noreply@blogger.com