Since Duncan is coming to Tokyo this weekend, we are going to have (2nd) Mono hackers meeting in Tokyo (yes, we had the first meeting two years ago). It is planned on 19th lunch time, at Umegaoka (close to Shimokitazawa). We'd welcome a few more people who would like to in join us (sorry but a few; we have no preparation for large meeting). Please feel free to mail me (atsushi@ximian.com) if you are interested.
After hacking label collector functionality for RELAX NG, I noticed that .NET 2.0's XmlSchemaValidator is kind of such an API (note that the link to MSDN documentation above shows so obsoleted). So I decided to implement it nearly a week ago. And now it's mostly implemented and checked in mono's svn (I think it is one of the hackiest xsd validator ;-).
Here is an example application of XmlSchemaValidator I wrote to test my implementation (it compiled with 2.0 csc).
Some of you might know that implementing XmlSchemaValidator sounds weird, because there is no API documentation for this new face (well, at least for VS 2005 October CTP which I reference). But the functionality is mostly obvious (at least to me). For example, ValidateElement() is startTagOpenDeriv, ValidateAttribute() is attDeriv, and ValidateEndOfAttribute() is startTagCloseDeriv (btw I really don't like those method names) described in James Clark's derivative algorithm for RELAX NG. One thing I was mystified was what ValidateElement(string,string,XmlSchemaInfo,string,string,string,string) overload meant, but thanks to MS developers, it was solved.
Actually XML Schema is much useless than RELAX NG stuff. XmlSchemaValidator is fully stateful, thus you cannot go back to the previous state easily. RELAX NG derivative implementation is stateless, so you can just attach those derivative instances to nodes in the editor. Oh, yes, XmlSchemaValidator could be stateful, if it supports cloning. But I don't think it can be lightweight.
btw, if you want, you can generate xsd from DTD and use XmlSchemaValidator, if you want.
19:51 (alp) eno: dude, am never getting any expected attributes
... So, I missed the point that after RelaxngValidatingReader.Read(), my validation engine which is based on James Clark's derivative algorithm keeps the state only "after it closed the start tag" (i.e. startTagCloseDeriv in that paper) and thus no attributes must be allowed. Sigh. So, to implement attribute auto-completion, I had to expose state transition object that users can try "the state transition after an attribute occured" (i.e. attDeriv). So, now the code became more complicated than yesterday (well, it is required complexity):
I put the code example (above), and the updated result.
So now RelaxngValidatingReader implicitly expects to the validating editor not to call .Read() until it closes the start tag. Instead, now each of the elements and attributes can hold the state at the node itself. (Am not sure it really works fine; I should consider cut/paste, insertion, and so on.)
BTW, personally I don't want to expose such features and requires "implementors" of RelaxngValidatingReader functionality to implement highly derivative-dependent features like this (that is bad for standardizng API). I won't recommend to learn this feature as long-live, good to know stuff. I am, on the other hand, expecting System.Xml 2.0 to have such functionality for XML Schema (IF Microsoft people can provide), but still don't think it's worthy of standardization.
On the next stage, I will have to implement some "error recovery" stuff so that users can enter invalid nodes and the implementation can still continue remaining validation.
Many thanks to Alp to try it out with his experimental UI stuff and to let me improve this library (I could also have chance to fix bugs and to optimize Commons.Xml.Relaxng stuff).
(5:00am JST: Updated the API and example that looks better.)
00:18 (alp) eno: do you have any thoughts on how i could
use a DTD to hack together xml completion?
00:19 (eno) alp: you want to develop such functionality
in your app?
00:20 (eno) mhm, actually I have no idea that supports
something like nxml-mode
00:20 (alp) yeah, perhaps for monodevelop
Actually that it is sort of what I wanted. However, for DTD and XSD, the implementation is not extensible (validation implementation is hidden in System.Xml.XmlValidatingReader) So I (kinda) implemented something like that, using my RelaxngValidatingReader:
Here I put the output of the example above. It is hacky (written mostly in 2 hours) and it does not check rejection by notAllowed. It might be improved later. Also, it uses Hashtable right now, but it does not have to be dictionary.
I also added Emptiable() (of type bool) that determines if an end tag is acceptable or not in current state. Actually to complete an end element, its name should be available, but due to the difference between QName and end tag name, it should be (and could be) implemented without RELAX NG validation stuff (to support such functionality, just keep start tag names in a stack). Similarly, you should also keep track of in-scope namespace declaration to fill proper prefix that is bound to a namespace of the QName contained in the results.
Oh, BTW don't ask Alp about that "dream": he has many other tasks and interests ;-)
Finally, I checked in /doc support patches in mcs. I remember the first patch was written in a day, nearly 7 months ago, and that worked mostly fine.
During the hacking on it, I found some problems around /doc feature:
... and more (I cannot remember anymore right now). Well, some of them are not actually problems. Some looks just bugs.
Well, actually csc must be doing better job than my hacky cref interpretation. It seems recursively tokenizing the attribute as a type name as well as the source itself, while I don't.
Anyways, it is kind of job I did only because there are some users (originally it would have been used to examine our System.Xml implementation by using NDoc in practice). I think monodoc format is much better and I don't think C# doc feature is good, as a translator who keeps track of changes in original document, usually from document themselves, not from source code files.
Now I am so glad that I can fully go back to sys.xml hackings.
Yesterday I started to write RELAX NG grammar inference. I hope this design won't **** you.
Just voted my first 5 on this very important bug that shows W3C standard conformance breakage.
Such XPathNavigator instance could be kept in memory only for such a stylesheet that contains document("")document() (it could be done in static analysis). So the reason of "by design" does not make sense.
Real developers could just implement standard-conformant implementation in easy way, instead of using casuistry on whether it is conformant or not, which just result in imposing annoyance on real users.
On the suggestion on "infer elements always globally", Am getting positive feeling from Microsoft XML guys via the feedback center.
On the other hand, am getting negative response for the suggestion on XmlSchemaSimpleType.ParseValue() which validates string considering facets. But I believe that XML Schema based developers will be absolutely appreciated by that feature. For example, it will be mandatory for XQP project that must support user-defined type constructor defined in the section 5 of W3C XQuery Functions and Operators specification. Microsoft guys might want to help your development.
It depends on you, XML developers, whether Microsoft will improve their library or not. We could provide our own advantages, but it would be still better that your advanced code will run on MS.NET too.
(FYI: You can "vote" for the suggestions ;-)
I've 90% finished XmlSchemaInference. I implemented it only because .NET 2.0 contains it.
XmlSchemaInference is very useful. For example, if you have such document like:
It creates two different definition of "product" elements. Here is the infered schema and generated serializable class.
So now I wonder if I had better port the same feature to Commons.Xml.Relaxng. RELAX NG is not so sucky than XML Schema, so I might be able to provide better XML structure inference engine. But XML structure inference itself is not so fun.
... after some thoughts, I decided to enter a new suggestion to MS feedback center which seems working again recently.
I found that the Last Call working draft of xml:id was out. But I think xml:id will be incompatible with Canonical XML (xml-c14n). Below is an excerpt from 2.4 Document Subsets in xml-c14n W3C REC:
The processing of an element node E MUST be modified slightly when an XPath node-set is given as input and the element's parent is omitted from the node-set. The method for processing the attribute axis of an element E in the node-set is enhanced. All element nodes along E's ancestor axis are examined for nearest occurrences of attributes in the xml namespace, such as xml:lang and xml:space (whether or not they are in the node-set). From this list of attributes, remove any that are in E's attribute axis (whether or not they are in the node-set). Then, lexicographically merge this attribute list with the nodes of E's attribute axis that are in the node-set. The result of visiting the attribute axis is computed by processing the attribute nodes in this merged attribute list.
Well, I don't think xml:id is wrong here. It is xml-c14n that is based on non-committed premises that all xml:* attributes must be inherited (yes, xml:lang, xml:space and xml:base were). Anyways, don't worry about that incompatibility. Canonical XML is already incompatible with XML Infoset with related to namespace information items.

Many of my friends have been saying that they feel sorry for Rupert that he does not have any more clothes in this cold season. Today I was hanging around Shibuya (central Tokyo area) with my friends, and they were so kind to buy a new one for him (from my budget). Now he looks younger than before.
I was escaping from /doc stuff and looking into xsd inference task (I cannot stand working only on that annoying task). I wrote some notes but incomplete. Apparently the most difficult area is particle inference, but right now not so many ideas. My current idea is to support non-XmlSchema language.