tag:blogger.com,1999:blog-3944976411672994427.post2954622792546868520..comments2023-09-03T17:53:38.313+07:00Comments on James Clark's Random Thoughts: XML 1.0 5th editionJames Clarkhttp://www.blogger.com/profile/04798042939786677843noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-3944976411672994427.post-14793242846972523172010-12-27T18:35:46.247+07:002010-12-27T18:35:46.247+07:00I probably agree with Tim that a thorough revisio...I probably agree with Tim that a thorough revision that creates something cohesive has more of a chance than this tinkering. XML 2.0. Thanks for your comments.Joshuahttp://onsitecomputer.com.aunoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-43229546194826345582008-11-05T22:56:00.000+07:002008-11-05T22:56:00.000+07:00In terms of Unicode support, what's vitally import...<EM>In terms of Unicode support, what's vitally important is that any Unicode character is allowed in attribute values and character data. And XML 1.0 has always supported that. This change is just about the Unicode characters allowed in element and attribute names (and entity names and processing instruction targets).</EM><BR/><BR/>There are three other issues that I think are not mentioned here, but are highly relevant.<BR/><BR/>[1] This change is also about the Unicode characters allowed in id values, for example in an anchor element in XHTML. These, in my mind, are much more likely to occur in non-Latin scripts, especially as IDN and IRIs become more prevalent. These are also constructs that are used by much less technically-minded authors. For example, I added an Ethiopic id value to the blog I point to below, and now the page doesn't validate.<BR/><BR/>[2] We have already designed XML in such a way that people are allowed to use non-ASCII element and attribute names and id values. That cat is out of the bag. We have allowed it, and we must expect that some people will want to take the spec at its word. What we are talking about in the 5th edition is more along the lines of avoiding arbitrary discrimination towards speakers of languages written with scripts that didn't make it into Unicode version 2.1, ie. speakers of languages written in Ethiopic, Canadian Syllabics, Khmer, Sinhala, Mongolian, Yi, Philippine, New Tai Lue, Buginese, Cherokee, Syloti Nagri, N’Ko, Tifinagh and other scripts.<BR/><BR/>[3] Staying with the idea that we have already allowed non-ASCII names and values, and people will use that feature, we need to be aware that individual characters have recently been added to Unicode blocks that existed in Unicode 2.1, such as Chinese, Cyrillic, Devanagari, Tamil, Bengali, Malayalam, and the like. In many cases these characters will see common use in languages that use these scripts today. This means that people who do take the XML spec at its word and exercise their right to use non-ASCII characters in things like ids, will either have difficulty understanding why one name works fine but another doesn't, or will have learned to write a somewhat stilted version of their language (at best). <BR/><BR/>The blog post I referred to above is at http://rishida.net/blog/?p=135.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-25260836196259896112008-10-26T20:04:00.000+07:002008-10-26T20:04:00.000+07:00bio, the Japanes language has so many homonyms. T...bio, the Japanes language has so many homonyms. Thus, roma-ji does not always make sense. Moreover, roma-ji is ugly and hard to read. I much prefer <BR/>市町村職員共済組合 to shichouson-shokuin-kyousai-kumiai. (This name appears in one of the schemas I created.)村田https://www.blogger.com/profile/16552967103277070244noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-62781011079474267572008-10-24T23:49:00.000+07:002008-10-24T23:49:00.000+07:00bio, you say that technical knowledge of computers...bio, you say that technical knowledge of computers requires familiarity with Latin script. Not everyone agrees with this, although many do. However, use of XML does not in general require "technical knowledge of computers". For example, researchers in the arts and humanities often prefer to use their own language and scientific terms to describe things. Those terms may be understood very well by people in their own domains. And, of course, this change also affect IDs.<BR/><BR/>We committed to supporting natural-language markup in XML 1.0, not only English. For example, you can already use Chinese characters in XML names, as long as you are careful to stick to characters defined by Unicode 2.1, of course. The question is not whether to continue to support Unicode, but how best to do so. I accept that there is not complete agreement on which way is best, nor on which way will succeed. So far XML 5e seems to be the best compromise.<BR/><BR/>Thanks,<BR/><BR/>LiamLiam Quinhttps://www.blogger.com/profile/07191558941418599733noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-15588054384222111962008-10-24T08:03:00.000+07:002008-10-24T08:03:00.000+07:00Murata-san: what's wrong with using Romanized Japa...Murata-san: what's wrong with using Romanized Japanese words for tag names? It's common enough for branding and signage in Japan, so why not in XML documents.<BR/><BR/>Liam: Technical knowledge of computers requires familiarity with Latin script (see: command lines, relational database table and row names, URIs). You can't unilaterally change that fact simply by changing XML.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-26861106079548006082008-10-22T23:11:00.000+07:002008-10-22T23:11:00.000+07:00Rick, I agree with you about version numbers. Inde...Rick, I agree with you about version numbers. Indeed, so does the XML Core WG, which is why 5e does what you are asking.<BR/><BR/>As for a 2.0, I think that could be a 10-year project (if you think HTML 5 is simple...), and it's not clear to me that it would get adoption.<BR/><BR/>If you have the time to read the proposed 5th edition and send comments to the mail address given there, they'll still of course be considered even though it's past the formal deadline. The 5th edition document is listed on http://www.w3.org/TR/<BR/><BR/>LiamLiam Quinhttps://www.blogger.com/profile/07191558941418599733noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-89877434976374508842008-10-22T10:56:00.000+07:002008-10-22T10:56:00.000+07:00I probably agree with Tim Bray that a more thoroug...I probably agree with Tim Bray that a more thorough revision that creates something cohesive has more of a chance than this tinkering. XML 2.0.<BR/><BR/>But I tend to think that the root problem here is that while XML has version numbers, these only correspond to an overly simplistic policy: fail if you don't understand the version number. <BR/><BR/>So it would be better for XML to first be improved to a major.minor version system where a version rejects a document with an unknown major version number outright, but attempts to parse documents with higher-value minor versions. This way, an xml 1.0 system would not reject an xml 1.2 document unless there was indeed some name which wasn't allowed by XML 1.0. <BR/><BR/>Introducing this change as edition problem, then allowing it to percolate through implementations and deployments for a few years, sets us up to switching to the new naming regime without confusion. <BR/><BR/>The XML Core WG should have done this years ago: without it, they are in effect abandoning the versioning system entirely by retrofitting substantive changes without the benefit of labels. I don't know why markup people don't see clear labels as the primary tool to escape confusion: it is very bold, if not bizarre.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-48824178714382543532008-10-17T23:04:00.000+07:002008-10-17T23:04:00.000+07:00A user who is technical enough to deal with raw XM...<I>A user who is technical enough to deal with raw XML markup can deal with ASCII element/attribute names</I><BR/><BR/>I don't see that technical knowledge of computers and familiarity with the Latin script should be tied together...<BR/><BR/>I agree with your argument that attribute and element names should be translated for the purposes of a user interface, for the purpose of localisation. However, any base language could be used, as long as you can find people to translate out of it (perhaps with the help of other translations). So I think this is not in any way an argument against the change.<BR/><BR/>We may have to revise the Namespaces spec, and there are other specs, not only at W3C, that may need to be changed over time, but it would seem fairer to me to say not that the namespaces spec is completele broken, since certainly all existing documents will continue to work fine, but that there may be an error that requires a minor revision there.<BR/><BR/>As I said in a private reply to you earlier, like you, I also originally preferred the idea of a 1.2, and was persuaded that it would not get uptake. So I see this as a compromise.<BR/><BR/>LiamLiam Quinhttps://www.blogger.com/profile/07191558941418599733noreply@blogger.comtag:blogger.com,1999:blog-3944976411672994427.post-19601922036372644742008-10-17T20:19:00.000+07:002008-10-17T20:19:00.000+07:00I was involved in a project of the Japanese govern...I was involved in a project of the Japanese government for creating schemas. These schemas represent information interchange between the local governments and the central government.<BR/><BR/>It was completely impossible to think of English names. My team extensively used Japanese names. I do not think that these names can be translated (unless I am willing to write a paragraph for each name).<BR/><BR/>I also heard from some doctor that his project uses Japanese tag names since they cannot be translated without changing the meaning.<BR/><BR/>I also believe that naming is more difficult than composition for non-native speakers. Poor naming is a problem of my schema language proposal (RELAX Core).<BR/><BR/>MURATA Makoto村田https://www.blogger.com/profile/16552967103277070244noreply@blogger.com