monogatari monogatari http://primates.ximian.com/~atsushi/lb/index.html Atsushi Eno atsushi@ximian.com Thu, 16 Dec 2004 01:17:32 GMT http://backend.userland.com/rss lb# 2004 / 12 / 16: (2nd) Mono meeting in Tokyo <p> Since <a href="http://www.a-chinaman.com/">Duncan</a> is coming to Tokyo this weekend, we are going to have (2nd) Mono hackers meeting in Tokyo (yes, we <a href="http://www.go-mono.com/archive/mono-0.18">had</a> the first meeting two years ago). It is planned on 19th lunch time, at Umegaoka (close to Shimokitazawa). We'd welcome a few more people who would like to in join us (sorry but a few; we have no preparation for large meeting). Please feel free to mail me (atsushi@ximian.com) if you are interested. </p> <h3>One of the reason why XmlSchemaValidator does not rock</h3> <p> After hacking label collector functionality for RELAX NG, I noticed that .NET 2.0's <a href="http://msdn2.microsoft.com/library/1xxh64a7.aspx">XmlSchemaValidator</a> is kind of such an API (note that the link to MSDN documentation above shows so obsoleted). So I decided to implement it nearly a week ago. And now it's mostly implemented and checked in mono's svn (I think it is one of the hackiest xsd validator ;-). </p><p> <a href="http://primates.ximian.com/~atsushi/code/XmlSchemaValidatingReader.cs">Here is an example application</a> of XmlSchemaValidator I wrote to test my implementation (it compiled with 2.0 csc). </p><p> Some of you might know that implementing XmlSchemaValidator sounds weird, because there is no API documentation for this new face (well, at least for VS 2005 October CTP which I reference). But the functionality is mostly obvious (at least to me). For example, ValidateElement() is startTagOpenDeriv, ValidateAttribute() is attDeriv, and ValidateEndOfAttribute() is startTagCloseDeriv (btw I really <a href="http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackId=73e2fb0a-65fb-4933-93d8-6f88bc7ac5b8">don't like those method names</a>) described in James Clark's <a href="http://www.thaiopensource.com/relaxng/derivative.html">derivative algorithm</a> for RELAX NG. One thing I was mystified was what <code>ValidateElement(string,string,XmlSchemaInfo,string,string,string,string)</code> overload meant, but thanks to MS developers, it was <a href="http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackId=7fc38ea1-c306-4fb7-8631-0e174b0ff3c6">solved</a>. </p><p> Actually XML Schema is much useless than RELAX NG stuff. XmlSchemaValidator is fully stateful, thus you cannot go back to the previous state easily. RELAX NG derivative implementation is stateless, so you can just attach those derivative instances to nodes in the editor. Oh, yes, XmlSchemaValidator could be stateful, if it supports cloning. But I don't think it can be lightweight. </p><p> btw, if you want, <a href="http://svn.myrealbox.com/websvn/filedetails.php?repname=Mono%20SVN&path=%2Ftrunk%2Fmcs%2Ftools%2Fdtd2xsd%2Fdtd2xsd.cs&rev=0&sc=0">you can generate xsd from DTD</a> and use XmlSchemaValidator, if you want. </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f12%2f16%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f12%2f16%200%3a00%3a00%3a%2b9 Thu, 16 Dec 2004 00:00:00 GMT 2004 / 12 / 07: rethinking element/attribute label collector <pre> 19:51 (alp) eno: dude, am never getting any expected attributes </pre> <p> ... So, I missed the point that after RelaxngValidatingReader.Read(), my validation engine which is based on James Clark's <a href="http://www.thaiopensource.com/relaxng/derivative.html">derivative algorithm</a> keeps the state only "after it closed the start tag" (i.e. startTagCloseDeriv in that paper) and thus no attributes must be allowed. Sigh. So, to implement attribute auto-completion, I had to expose state transition object that users can try "the state transition after an attribute occured" (i.e. attDeriv). So, now the code became more complicated than yesterday (well, it is required complexity): </p> <xmp class="code-csharp"> XmlTextReader xtr = new XmlTextReader ("relaxng.rng"); RelaxngPattern p = RelaxngPattern.Read ( new XmlTextReader ("relaxng.rng")); RelaxngValidatingReader rvr = new RelaxngValidatingReader (xtr, p); TextWriter Out = Console.Out; for (; !rvr.EOF; rvr.Read ()) { object state = rvr.GetCurrentState (); Out.WriteLine ("Current node: {0} ({1}) -> {2}", rvr.Name, rvr.NodeType, rvr.Emptiable (state) ? "Emptiable" : "not Emptiable"); Out.WriteLine (" - expected elements -"); foreach (XmlQualifiedName qn in rvr.GetElementLabels (state)) { Out.WriteLine (" " + qn); object astate = rvr.AfterOpenStartTag ( state, qn.Name, qn.Namespace); Out.WriteLine (" - expected attributes -"); foreach (XmlQualifiedName aqn in rvr.GetAttributeLabels (astate)) Out.WriteLine (" " + aqn); } } </xmp> <p> I put <a href="http://primates.ximian.com/~atsushi/code/getlabelsresult.txt">the code example (above)</a>, and <a href="http://primates.ximian.com/~atsushi/code/getlabelsresult.txt">the updated result</a>. </p> <p> So now RelaxngValidatingReader implicitly expects to the validating editor not to call .Read() until it closes the start tag. Instead, now each of the elements and attributes can hold the state at the node itself. (Am not sure it really works fine; I should consider cut/paste, insertion, and so on.) </p> <p> BTW, personally I don't want to expose such features and requires "implementors" of RelaxngValidatingReader functionality to implement highly derivative-dependent features like this (that is bad for standardizng API). I won't recommend to learn this feature as long-live, good to know stuff. I am, on the other hand, expecting System.Xml 2.0 to have such functionality for XML Schema (IF Microsoft people can provide), but still don't think it's worthy of standardization. </p> <p> On the next stage, I will have to implement some "error recovery" stuff so that users can enter invalid nodes and the implementation can still continue remaining validation. </p> <p> Many thanks to Alp to try it out with his experimental UI stuff and to let me improve this library (I could also have chance to fix bugs and to optimize Commons.Xml.Relaxng stuff). </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f12%2f07%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f12%2f07%200%3a00%3a00%3a%2b9 Tue, 07 Dec 2004 00:00:00 GMT 2004 / 12 / 06: expecting the next element/attribute names w/ RelaxngValidatingReader <p>(5:00am JST: Updated the API and example that looks better.)</p> <pre> 00:18 (alp) eno: do you have any thoughts on how i could use a DTD to hack together xml completion? 00:19 (eno) alp: you want to develop such functionality in your app? 00:20 (eno) mhm, actually I have no idea that supports something like nxml-mode 00:20 (alp) yeah, perhaps for monodevelop </pre> <p> Actually that it is sort of what I wanted. However, for DTD and XSD, the implementation is not extensible (validation implementation is hidden in System.Xml.XmlValidatingReader) So I (kinda) implemented something like that, using my RelaxngValidatingReader: </p> <xmp class="code-csharp"> XmlTextReader xtr = new XmlTextReader ("relaxng.rng"); RelaxngPattern p = RelaxngPattern.Read ( new XmlTextReader ("relaxng.rng")); RelaxngValidatingReader rvr = new RelaxngValidatingReader (xtr, p); rvr.MoveToContent (); for (rvr.MoveToContent (); !rvr.EOF; rvr.Read ()) { Console.WriteLine ("Name: {0}, NodeType: {1} -> {2}", rvr.Name, rvr.NodeType, rvr.Emptiable () ? "Emptiable" : "not Emptiable"); Console.WriteLine (" - expected attributes -"); foreach (XmlQualifiedName qn in rvr.ExpectedAttributes) Console.WriteLine ("{0} in {1}", qn.Name, qn.Namespace); Console.WriteLine (" - expected elements -"); foreach (XmlQualifiedName qn in rvr.ExpectedElements) Console.WriteLine ("{0} in {1}", qn.Name, qn.Namespace); } </xmp> <p> <a href="http://primates.ximian.com/~atsushi/getlabels-result.txt">Here</a> I put the output of the example above. It is hacky (written mostly in 2 hours) and it does not check rejection by notAllowed. It might be improved later. Also, it uses Hashtable right now, but it does not have to be dictionary. </p> <p> I also added Emptiable() (of type bool) that determines if an end tag is acceptable or not in current state. Actually to complete an end element, its name should be available, but due to the difference between QName and end tag name, it should be (and could be) implemented without RELAX NG validation stuff (to support such functionality, just keep start tag names in a stack). Similarly, you should also keep track of in-scope namespace declaration to fill proper prefix that is bound to a namespace of the QName contained in the results. </p> <p> Oh, BTW don't ask Alp about that "dream": he has many other tasks and interests ;-) </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f12%2f06%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f12%2f06%200%3a00%3a00%3a%2b9 Mon, 06 Dec 2004 00:00:00 GMT 2004 / 12 / 03: mcs now supports /doc <p> Finally, I checked in /doc support patches in mcs. I remember the first patch was written in a day, nearly 7 months ago, and that worked mostly fine. </p> <p> During the hacking on it, I found some problems around /doc feature: </p> <ul> <li>There is no assurange to have documentation lines in the expected order when we are using partial types. (We can control by ordering the file names to csc.exe, but can you, especially when you use vs.net?)</li> <li>There is no check whether a documented prefix 'T:' 'F:' 'P:' 'M:' are correct.</li> <li>T:namespace_name is incorrectly allowed.</li> <li>There is no invalid comment check on (and between) attribute tokens.</li> <li>Parsing comment line looks hacky. Those '///' lines seems handled individually and thus they are laid out for each line, but when we have split markup like "/// &lt;see\ncref='F:Foo'" it will connect those two lines, which means that the whole markup might be parsed (i.e. checked well-formedness) per line.</li> <li>"cref" attributes are pretty nasty. There is no normalization for members in the type itself (e.g. if you have TestType.FooField, there will be cref="F:TestType.FooField" and cref="F:FooField" in the resulting document).</li> </ul> <p> ... and more (I cannot remember anymore right now). Well, some of them are not actually problems. Some looks just bugs. </p> <p> Well, actually csc must be doing better job than my hacky cref interpretation. It seems recursively tokenizing the attribute as a type name as well as the source itself, while I don't. </p> <p> Anyways, it is kind of job I did only because there are some users (originally it would have been used to examine our System.Xml implementation by using NDoc in practice). I think monodoc format is much better and I don't think C# doc feature is good, as a translator who keeps track of changes in original document, usually from document themselves, not from source code files. </p> <p> Now I am so glad that I can fully go back to sys.xml hackings. </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f12%2f03%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f12%2f03%200%3a00%3a00%3a%2b9 Fri, 03 Dec 2004 00:00:00 GMT 2004 / 11 / 28: RelaxngInference <p> Yesterday I started to write <a href="http://primates.ximian.com/~atsushi/code/RelaxngInference.cs">RELAX NG grammar inference</a>. I hope this <a href="http://primates.ximian.com/~atsushi/code/RelaxngInferenceDesign.txt">design</a> won't **** you. </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f28%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f28%200%3a00%3a00%3a%2b9 Sun, 28 Nov 2004 00:00:00 GMT 2004 / 11 / 24: document("") <p> Just voted my first 5 on <a href="http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=609308ef-db5a-475a-9813-cc4e2ac74399">this very important bug</a> that shows W3C standard conformance breakage. </p> <p> Such XPathNavigator instance could be kept in memory <strong>only for</strong> such a stylesheet that contains <del>document("")</del><ins>document()</ins> (it could be done in static analysis). So the reason of "by design" does not make sense. </p> <p> Real developers could just implement standard-conformant implementation in easy way, instead of using casuistry on whether it is conformant or not, which just result in imposing annoyance on real users. </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f24%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f24%200%3a00%3a00%3a%2b9 Wed, 24 Nov 2004 00:00:00 GMT 2004 / 11 / 19: Let's make System.Xml 2.0 not suck <p> On the <a href="http://lab.msdn.microsoft.com/ProductFeedback/viewfeedback.aspx?feedbackid=4a8cca98-9b5a-416e-98c7-906575a6fc6d">suggestion on "infer elements always globally"</a>, Am getting positive feeling from Microsoft XML guys via the feedback center. </p> <p> On the other hand, am getting negative response for the suggestion on <a href="http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackId=4a5e952a-b4b0-47ef-842f-9161222bebfc">XmlSchemaSimpleType.ParseValue()</a> which validates string considering facets. But I believe that XML Schema based developers will be absolutely appreciated by that feature. For example, it will be mandatory for <a href="http://sourceforge.net/projects/xqp">XQP project</a> that must support user-defined type constructor defined in the section 5 of <a href="http://www.w3.org/TR/2004/WD-xpath-functions-20041029/">W3C XQuery Functions and Operators</a> specification. Microsoft guys might want to help your development. </p> <p> It depends on you, XML developers, whether Microsoft will improve their library or not. We could provide our own advantages, but it would be still better that your advanced code will run on MS.NET too. </p> <p> (FYI: You can "vote" for the suggestions ;-) </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f19%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f19%200%3a00%3a00%3a%2b9 Fri, 19 Nov 2004 00:00:00 GMT 2004 / 11 / 17: Useful codeblock <xmp class="code-csharp"> using QName = System.Xml.XmlQualifiedName; using Form = System.Xml.Schema.XmlSchemaForm; using Use = System.Xml.Schema.XmlSchemaUse; using SOMList = System.Xml.Schema.XmlSchemaObjectCollection; using SOMObject = System.Xml.Schema.XmlSchemaObject; using Element = System.Xml.Schema.XmlSchemaElement; using Attr = System.Xml.Schema.XmlSchemaAttribute; using AttrGroup = System.Xml.Schema.XmlSchemaAttributeGroup; using AttrGroupRef = System.Xml.Schema.XmlSchemaAttributeGroupRef; using SimpleType = System.Xml.Schema.XmlSchemaSimpleType; using ComplexType = System.Xml.Schema.XmlSchemaComplexType; using SimpleModel = System.Xml.Schema.XmlSchemaSimpleContent; using SimpleExt = System.Xml.Schema.XmlSchemaSimpleContentExtension; using SimpleRst = System.Xml.Schema.XmlSchemaSimpleContentRestriction; using ComplexModel = System.Xml.Schema.XmlSchemaComplexContent; using ComplexExt = System.Xml.Schema.XmlSchemaComplexContentExtension; using ComplexRst = System.Xml.Schema.XmlSchemaComplexContentRestriction; using SimpleTypeRst = System.Xml.Schema.XmlSchemaSimpleTypeRestriction; using SimpleList = System.Xml.Schema.XmlSchemaSimpleTypeList; using SimpleUnion = System.Xml.Schema.XmlSchemaSimpleTypeUnion; using SchemaFacet = System.Xml.Schema.XmlSchemaFacet; using LengthFacet = System.Xml.Schema.XmlSchemaLengthFacet; using MinLengthFacet = System.Xml.Schema.XmlSchemaMinLengthFacet; using Particle = System.Xml.Schema.XmlSchemaParticle; using Sequence = System.Xml.Schema.XmlSchemaSequence; using Choice = System.Xml.Schema.XmlSchemaChoice; </xmp> <p> I've 90% finished XmlSchemaInference. I implemented it only because .NET 2.0 contains it. </p><p> XmlSchemaInference is very useful. For example, if you have such document like: </p> <xmp class="code-xml"> <products> <category> <category> <product name="foo" /> <product name="bar" /> <product name="baz" /> </category> <product name="hoge" /> <product name="fuga" /> </category> </products> </xmp> <p> It creates two different definition of "product" elements. Here is the <a href="http://primates.ximian.com/~atsushi/schema-inference-is-cool/complicated.xsd">infered schema</a> and <a href="http://primates.ximian.com/~atsushi/schema-inference-is-cool/complicated.cs">generated serializable class</a>. </p> <p> So now I wonder if I had better port the same feature to Commons.Xml.Relaxng. RELAX NG is not so sucky than XML Schema, so I might be able to provide better XML structure inference engine. But XML structure inference itself is not so fun. </p> <p> ... after some thoughts, I decided to enter <a href="http://lab.msdn.microsoft.com/ProductFeedback/viewfeedback.aspx?feedbackid=4a8cca98-9b5a-416e-98c7-906575a6fc6d">a new suggestion</a> to MS feedback center which seems working again recently. </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f17%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f17%200%3a00%3a00%3a%2b9 Wed, 17 Nov 2004 00:00:00 GMT 2004 / 11 / 13: xml:id and canonical XML <p> I found that the Last Call working draft of xml:id was out. But I think xml:id will be incompatible with Canonical XML (xml-c14n). Below is an excerpt from <a href="http://www.w3.org/TR/2001/REC-xml-c14n-20010315#DocSubsets">2.4 Document Subsets</a> in xml-c14n W3C REC: </p> <blockquote> The processing of an element node E MUST be modified slightly when an XPath node-set is given as input and the element's parent is omitted from the node-set. The method for processing the attribute axis of an element E in the node-set is enhanced. All element nodes along E's ancestor axis are examined for nearest occurrences of attributes in the xml namespace, such as xml:lang and xml:space (whether or not they are in the node-set). From this list of attributes, remove any that are in E's attribute axis (whether or not they are in the node-set). Then, lexicographically merge this attribute list with the nodes of E's attribute axis that are in the node-set. The result of visiting the attribute axis is computed by processing the attribute nodes in this merged attribute list. </blockquote> <p> Well, I don't think xml:id is wrong here. It is xml-c14n that is based on non-committed premises that <strong>all xml:* attributes must be inherited</strong> (yes, xml:lang, xml:space and xml:base were). Anyways, don't worry about that incompatibility. Canonical XML is already incompatible with XML Infoset with related to namespace information items. </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f13%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f13%200%3a00%3a00%3a%2b9 Sat, 13 Nov 2004 00:00:00 GMT 2004 / 11 / 11: A change of seasons <div style="float: left; margin: 1em"> <p><img src="http://primates.ximian.com/~atsushi/green-green.jpg" /></p> </div> <p> Many of my friends have been saying that they feel sorry for Rupert that he does not have any more clothes in this cold season. Today I was hanging around Shibuya (central Tokyo area) with my friends, and they were so kind to buy a new one for him (from my budget). Now he looks younger than before. </p> <h3>XmlSchemaInference</h3> <p> I was escaping from /doc stuff and looking into xsd inference task (I cannot stand working only on that annoying task). I wrote some <a href="http://primates.ximian.com/~atsushi/schemainference/XmlSchemaInferenceDesign.txt">notes</a> but incomplete. Apparently the most difficult area is particle inference, but right now not so many ideas. My current idea is to support <a href="http://www.relaxng.org/">non-XmlSchema language</a>. </p> http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f11%200%3a00%3a00%3a%2b9 Atsushi Enomoto (atsushi@ximian.com) http://primates.ximian.com/~atsushi/lb/all.html#2004%2f11%2f11%200%3a00%3a00%3a%2b9 Thu, 11 Nov 2004 00:00:00 GMT