monogatari http://primates.ximian.com/~atsushi/blog/ stories by Atsushi Eno en-us atsushi@ximian.com Copyright 2004 Tue, 01 Jun 2004 21:59:09 +0000 Tue, 08 Jun 2004 04:11:52 +0000 http://www.movabletype.org/?v=2.51 atsushi@ximian.com 60 the dance of fools http://primates.ximian.com/~atsushi/blog/archives/000460.html June 01, 2004 460@http://primates.ximian.com/~atsushi/blog/ <p>Since last week am spending somewhat sick days and couldn't hack nicely :( Moreover my house is on repairment and became not good place to hack (noise, closed window without air conditioner in my room...). Am mostly out recent days.</p> <p><h3>major DataRow performance fix</h3></p> <p>I have been helped my DataSet hack by julia who has been trying so practical example code with megs of data. Yes, it was not good to trace the bug location, but very good at testing performance ;-)</p> <p>After the dataset fixes, I believed that I improved DataSet.ReadXml() performance (to compare the results, I ran my "detacheable" ReadXml() implementation under MS.NET). So when I saw the performance results on ReadXml() after some stopper ReadXmlSchema() fixes, I could not believe what is happening. Our ReadXml() was > 1000x slower than MS.NET(!). So I started profile, and noticed that DataRow.ctor() was 1000x slower. </p> <p>I fixed it immediately, but unfortunately I could not notice that before our beta2 release (unless there would be a critical package problem and we have to re-collect sources from cvs :-). So if you want to know what happens in Mono 1.0 after beta2, install <a href="http://primates.ximian.com/~atsushi/System.Data.dll">this assembly</a>. I might upload a patch for this fix here after beta2 release.</p> <p><h3>anglia test collection</h3></p> <p>Recently (well, in fact nearly two weeks ago ;) Jeff Rafter <a href="http://relaxng.org/pipermail/relaxng-user/2004-May/000514.html">announced </a> his new "anglia" RELAX NG compact syntax test collection (he is implementing his own compact syntax parser). It is cool and was so useful for me and I could fix really problematic part of my parser. Bugs still remain, but I could easily add a simple standalone test for anglia. Am very appliciated.</p> http://primates.ximian.com/~atsushi/blog/archives/000460.html Tue, 01 Jun 2004 21:59:09 +0000 improvements in ADO.NET and XML land http://primates.ximian.com/~atsushi/blog/archives/000441.html May 17, 2004 441@http://primates.ximian.com/~atsushi/blog/ <p>After nearly one month of hacking, ADO.NET and XML things should be improved than before.</p> <p>ReadXml() now handles schema inference and data reading in different stages, and thus it now does not load the entire document to XmlDocument when ReadMode is specified to ignore schema.</p> <p>InferXmlSchema() is implemented - yes, its core had been implemented in ReadXml(), but somewhat improved. It now distinguishes DataSet element and DataTable element. Simple type element columns can be ignored or even removed when a conflicting DataTable element (that is, whose name is the same as the column's mapped name and has DataTable structure) appeared.</p> <p>ReadXmlSchema() now should identify whether the element definition is for DataSet or DataTable more precisely, and relationship handling should be improved.</p> <p>XmlDataDocument now does not use GetElementsByTagName() frequently and thus two column elements that have the same name and in different table row element can be treated correctly. In construction time, nodes are not loaded to another XmlDocument anymore.</p> <p>Since most of the features above are independent from basic DataSet stuff, they could be run and tested under MS.NET. For example, XmlDataReader.cs that implements ReadXml() can be used under MS.NET to fill DataSet faster than MS's ReadXml(). XmlSchemaDataImporter.cs that implements ReadXmlSchema() can be used to handle "fixed" value in attributes more correctly than MS's ReadXmlSchema().</p> <p>Since am updating the whole stuff asynchronously, DataSet and XML feature were so corrupt in these two or three weeks, but now it is/should be time for them to be stable. So if you are using DataSet and XML feature, please try them and tell me if it worked or not.</p> <p>BTW I love corcompare tools that is so helpful to check how my TypedDataSetGenerator implementation is APIwise-completed. I put the API comparison <a href="http://primates.ximian.com/~atsushi/typedds-api-diff/diff.html">Here</a> (the page is not complete enough to handle script events).</p> http://primates.ximian.com/~atsushi/blog/archives/000441.html Mon, 17 May 2004 21:16:54 +0000 Cannot DataSet be read from streaming XmlReader? http://primates.ximian.com/~atsushi/blog/archives/000424.html April 29, 2004 424@http://primates.ximian.com/~atsushi/blog/ <p>DataSet.ReadXml() is still hot - here "hot" does not mean any good, especially confronting upcoming beta1 release.</p> <p>In the comment to my blog post, I was asked if we can provide XmlDocument-less DataSet.ReadXml(). I have been developing <a href="http://primates.ximian.com/~atsushi/blog/archives/XmlDataInferenceLoader.cs">anothe implementation</a> , but I had no certain answer about that question. However, I found one problematic design in MS.NET.</p> <p>MS.NET infers a data columns in a certain order. It infers:</p> <p><ol><br /> <li>A Hidden primary key column: If the table element contains another table element</li><br /> <li>Element columns: that will be either a Hidden reference column, or a simple type element. They are in appeared order (i.e. unorderd by their ColumnMapping types).<br /> <li>Finally Attribute columns</li><br /> </ol></p> <p><a href="http://primates.ximian.com/~atsushi/blog/archives/test26.xml">This instance</a> will assert my inference above.</p> <p>Then, there is a problem. Because</p> <p><ol><br /> <li>Attributes can be read only when the XmlReader is on startElement</li><br /> <li>We cannot fill a datarow for detached column</li><br /> <li>We cannot change column orders (at least without removal of the existing column content for each row)</li><br /> </ol></p> <p>So if the column order is their "design", then Microsoft's ReadXml() implementation is bound to DOM.</p> <p>Then what shuold we do? Two ideas occured to me: A) ignore such column orders (or addint column order replacement internally), or B) separate inference engine and data loading engine. My current implementation is A) but I think B) would be better since it won't be complicated. Moreover, there are many System.Data hackers so that I don't want to impact on the basic class changes<br /> except for xml support stuff. However, I have no idea if we have enough time to develop B)... I have many things that should be fixed such as ReadXmlSchema(), TypedDataSetGenerator, XmlDataDocument, SqlTypes etc.</p> http://primates.ximian.com/~atsushi/blog/archives/000424.html Thu, 29 Apr 2004 09:55:15 +0000 "defining the game" in Japanese http://primates.ximian.com/~atsushi/blog/archives/000421.html April 26, 2004 421@http://primates.ximian.com/~atsushi/blog/ <p>You can read Miguel's "defining the game" <a href="http://primates.ximian.com/~atsushi/defining-the-game.html">in Japanese</a>.</p> <p>Today Gert Driesen of NDoc team told me that he could manage to run NDoc (NDocConsole.exe) on Mono and he could create MSDN docs without any problem. Am glad to hear that. I have many thanks to him for his helpful advice to manage to run NAnt (to build NDoc; he is also in NAnt team), and many nice reports on bugzilla.</p> <p>Now I'm still on the ADO.NET way. Today I checked the first TypedDataSetGenerator implementation into cvs. It is still incomplete yet, but with that code, we can now generate typed dataset class, the last missing large part of xsd.exe. It will be improved in a cpl days I hope.</p> http://primates.ximian.com/~atsushi/blog/archives/000421.html Mon, 26 Apr 2004 19:18:25 +0000 the last samurai http://primates.ximian.com/~atsushi/blog/archives/000417.html April 22, 2004 417@http://primates.ximian.com/~atsushi/blog/ <p>OK, time to be happy with Rupert.</p> <p><img src="http://primates.ximian.com/~atsushi/blog/archives/the_last_samurai.jpg" /></p> http://primates.ximian.com/~atsushi/blog/archives/000417.html Thu, 22 Apr 2004 06:24:28 +0000 the dark side of the moon http://primates.ximian.com/~atsushi/blog/archives/000416.html April 22, 2004 416@http://primates.ximian.com/~atsushi/blog/ <p>Just a note, I found that in <a href="http://osnews.com/story.php?news_id=6795">the recent article</a> about mono on osnews, there was a mention about XmlDocument.GetElementById(). There has been <a href="http://www.go-mono.com/xml-classes.html">description</a> about it (why it is incomplete). GetElementById() is one of the troublesome part of the DOM specification (and I don't like it) and I don't feel I had better dig into it. If you still think that method is "the most important" as the author says, you don't have to hesitate to use it... I think it is as stable as Microsoft.NET.</p> <p>I have been frustrated that there is an proposal to change copyright law from Japanese Agency for Cultural Affairs in Japan. In the draft, they required exclusive import entitlement for CDs. They had been explained that "the law does not means imported CDs are prohibited in Japan."<br /> But one Japanese attorney pointed out that with that law any foreign lebels can "prohibit" specific import of CDs, and thus Amazon might be regarded as "illegal" in Japan (since there is "no-discount resale system" in Japan, many people prefer Amazon to buy CDs. That system have long been criticized by most of people except for publishing companies).<br /> After this pointout, the agency changed their explaination that "The lebels promised us not to prohibit import." However, recently it is said that there are many copy of "excerpted comment from RIAA" (not sure if it is true) on the net, saying that:</p> <p><blockquote>We thus strongly caution against the adoption of discriminatory legislation, and call upon the Government to provide rights to control importation with regard to all repertoire.</blockquote></p> <p>The law is likely to pass the Congress, while I haven't seen any people including academic persons who agree to that draft, except for the agency and concerned associations. And sadly to say, the iron-triangle is so strong here.</p> <p>I also found a potentially sad thing that MS ADO.NET uses XmlDocument in DataSet.ReadXml(). I wonder if the way is required to implement that method... it is a big waste of memory to load the entire document to memory.</p> http://primates.ximian.com/~atsushi/blog/archives/000416.html Thu, 22 Apr 2004 06:20:28 +0000 How is the mapping from xsd to DataSet done? http://primates.ximian.com/~atsushi/blog/archives/000409.html April 16, 2004 409@http://primates.ximian.com/~atsushi/blog/ <p> I am trying to rewrite DataSet.ReadXmlSchema(). There had been great prior art named XmlSchemaMapper by Ville Palo, but since at the time he wrote up that class, there was no complete XML Schema stack, so he had to have a hard time to implement it without Post Schema Compilation Information. So I thought it is time to improve them and simplify the class. Actually, I'm on creating another class. </p><p> I began with the analysis of ReadXmlSchema() behavior. Actually, Microsoft had written <a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/_generating_dataset_relational_structure_from_xsd.asp">Generating DataSet Relational Structure from XML Schema (XSD)</a>, a detailed documentation</a> about DataSet and XML interoperability. So you will be content with that doc. Here is another attempt to describe "how are XML schemas consumed by ReadXmlSchema()". I will implement the class based on ths analysis. Comments are applicable. </p> <strong>Targetable Schema Components</strong> <p> Only global global elements that hold complex type are converted into a table. The components of the type of the element are subsequently converted into a table, BUT there is an exception. As for "DataSet elements", the type is just ignored (see "DataSet Element definition" below). </p><p> Unused complex types will never be converted. </p><p> Global simple types and global attributes are never converted. They cannot be a table. Local complex types are also converted into a table. </p><p> Local elements are converted into either a table or a column in the "context DataTable". </p> <strong>Name Convention (incomplete)</strong> <p> Since local complex types are anonymous, we have to name for each component. Thus, and since complex types and elements can have the same name each other, we have to manage a table for mappings from a name to a component. The names must be also used in DataRelation definitions correctly. </p> <strong>DataSet element definition</strong> <p> "DataSet element" is such element that has an attribute msdata:IsDataSet (where prefix "msdata" is bound to urn:schemas-microsoft-com:xml-msdata). </p><p> Only the first global element that matches the condition above is regarded as DataSet element (by necessary design or just a bug?) instead of handling as an error. </p><p> All global elements are considered as an alternative in the dataset element. </p><p> For local elements, msdata:IsDataSet are just ignored. </p> <strong>Importing Complex Types</strong> <p> When an xs:element is going to be mapped, its complex type (remember that only complex-typed elements are targettable) are expanded to DataColumn. </p><p> DataColumn has a property MappingType that shows whether this column came from attribute or element. </p><p> [Question: How about MappingType.Simple? How is it used?] </p><p> Additionally, for particle elements, it might also create another DataTable (but for the particle elements in context DataTable, it will create an index to the new table). </p><p> For group base particles (XmlSchemaGroupBase; sequence, choice, all) each component in those groups are mapped to a column. Even if you import "choice" or "all" components, DataSet.WriteXmlSchema() will output them just as a "sequence". </p> <strong>Identity Constraints and DataRelations</strong> <p> Only constraints on "DataSet element" is considered. All other constraint definitions are ignored. Note that it is DataSet that has the property Relations (of type DataRelationCollection). </p><p> xs:key and xs:unique are handled as the same (then both will be serialized as xs:unique). </p><p> The XPath expressions in the constraints are strictly limited; they are expected to be expandable enough to be mappable for each </p><ul> <li><p> selector to "any_valid_XPath/is/OK/blah" where "blah" is one of the DataTable name. It looks that only the last QName section is significant and any heading XPath step is OK (even if the mapped node does not exist). </p> </li><li><p>field to QName that is mapped to DataColumn in the DataTable (even ./QName is not allowed) </p></li></ul> http://primates.ximian.com/~atsushi/blog/archives/000409.html Fri, 16 Apr 2004 18:42:58 +0000 Nothing more than scribble http://primates.ximian.com/~atsushi/blog/archives/000393.html April 04, 2004 393@http://primates.ximian.com/~atsushi/blog/ <p>Todd made cool gettext support for MD:</p> <p><img src="http://primates.ximian.com/~atsushi/md-jp.png" /></p> <p>(Of course, it is not him who created this incomplete <a href="http://primates.ximian.com/~atsushi/ja_JP.po">ja_JP.po</a> file :-)</p> <p>--------</p> <p>Well, after tberman's another set of gettext hacking, I could get <a href="http://primates.ximian.com/~atsushi/md-jp2.png">another shot</a>, with updated .po file.</p> <p>Note that basically translation should not be done against heavily ongoing works; I was playing only 1 to 1.5 hours with it. But it also means that when some releases will be done, it won't require so much time ;-)</p> http://primates.ximian.com/~atsushi/blog/archives/000393.html Sun, 04 Apr 2004 12:58:05 +0000 RELAX NG Compact Syntax support http://primates.ximian.com/~atsushi/blog/archives/000375.html March 17, 2004 375@http://primates.ximian.com/~atsushi/blog/ <p>As already been <a href="http://pages.infinit.net/ctech/20040316-0936.html">posted</a>, Sebastien and I had been working on System.Security.Cryptography.Xml (i.e. XML Signature) implementation, and now it can give somewhat good result. However, since I had found that he is <strong>Crazy</strong>, I decided to escape from his madness and took another way. </p><p> And then, here I finished the first <a href="http://www.oasis-open.org/committees/relax-ng/compact-20021121.html">RELAX NG Compact Syntax</a> (RNC) parser implementation for Mono. Now it is in cvs. Related files are in mcs/class/Commons.Xml.Relaxng.Rnc/ directory. </p><p> So what is RELAX NG Compact Syntax? It enables you to write RELAX NG grammar in very short way. For example, when you write the 130 lines of RELAX NG "pattern" definition markup into this 11 lines (you can check <a href="http://www.oasis-open.org/committees/relax-ng/spec-20011203.html">Appendix A of the spec</a> that how it is specified): </p> <pre> pattern = element element { (nameQName | nameClass), (common & pattern+) } | element attribute { (nameQName | nameClass), (common & pattern?) } | element group|interleave|choice|optional |zeroOrMore|oneOrMore|list|mixed { common & pattern+ } | element ref|parentRef { nameNCName, common } | element empty|notAllowed|text { common } | element data { type, param*, (common & exceptPattern?) } | element value { commonAttributes, type?, xsd:string } | element externalRef { href, common } | element grammar { common & grammarContent* } </pre> <p>Doesn't it look fascinating? ;-) We already support primitive XML Schema datatypes, so many XML Schema users won't have SoBig problem to immigrate to RELAX NG world. (Basically I like RELAX NG. Correctly to say, I don't like XML Schema. I can speak much ill of the spec :p) </p><p> OK, I want to flame the spec ;-) but it's not time for that. Let's go back to the new parser. The usage is simple: </p> <xmp class="code-csharp"> RncParser parser = new RncParser (new NameTable ()); TextReader source = new StreamReader ("relaxng.rnc", Encoding.UTF8); RelaxngGrammar grammar = parser.Parse (source); return new RelaxngValidatingReader ( new XmlTextReader ("anygrammar.rng"), grammar); </xmp> <p>Only XmlNameTable and TextRreader are required (well, currently name table is not fully used ;-). Oh, please note that it is just made, so it should contain bugs as yet. </p><p> To implement this jay-based parser, I had to ignore some of its formal description (for example, some formal syntax confuses Element and Elements, that causes not a little problem, especially for weakly-typed lexer/tokenizer code ;-). I will write about them another time. </p> http://primates.ximian.com/~atsushi/blog/archives/000375.html Wed, 17 Mar 2004 20:54:59 +0000 RelaxngValidatingReader improvements http://primates.ximian.com/~atsushi/blog/archives/000352.html February 27, 2004 352@http://primates.ximian.com/~atsushi/blog/ <p>Recently I committed <a href="http://www.relaxng.org/">RELAX NG</a> validating reader stuff. It was 9 months ago when I made the last commits on that classes, while that is what I had really wanted to do ;-)</p> <p>The usage is very easy:</p> <p><code><br /> XmlReader r = new RelaxngValidatingReader (<br /> new XmlTextReader ("sample.xml"),<br /> new XmlTextReader ("sample.rng"));<br /> </code></p> <p>Or you can specify RelaxngPattern instead of XmlReader:</p> <p><code><br /> // Wow, relaxng.rng is really self-describing, unlike XMLSchema.xsd ;-)<br /> RelaxngPattern p = RelaxngPattern.Read (<br /> new XmlTextReader ("relaxng.rng"));<br /> XmlReader r = new RelaxngValidatingReader (<br /> new XmlTextReader ("relaxng.rng", p));<br /> </code></p> <p>[The code is in mcs/classs/Commons.Xml.Relaxng]</p> <p>The first priority task was to rename public classes and fix member signatures (mainly access modifiers). I don't want to put extraneous public methods/fields (it was very bad design I think). Well, if any of you had been using "RngPattern", it now became "RelaxngPattern" (and all RngXXX class became RelaxngXXX as well).</p> <p>I have been using James Clark's <a href="http://www.thaiopensource.com/relaxng/derivative.html">derivative</a> algorithm (basically) and it is implemented in classes in Commons.Xml.Relaxng.Derivative namespace. Though the classes are made as public, they are not expected to be used right now, and in fact I changed them radically. </p> <p>It is still not as stable as I want, but it became more stable I think. I put standalone tests that uses James Clark's test suite. I could reduce nearly 120 grammar compilation failures (out of 373 cases) by less than 40 cases, and possibly a large number of instance validation errors by nearly 20 cases.</p> <p>I also added datatype support on them. By default it supports XML Schema datatypes ("http://www.w3.org/2001/XMLSchema-datatypes") as well as default namespace datatypes (i.e. "string" and "token"). To support them, my derivative validation design had to be changed.</p> <p>You don't have to change any lines of your code to get XML schema support. Just embed XML schema datatypes URI (as relaxng.rng does) and use it.</p> <p>Data type support is done by these classes:<br /> <ul><br /> <li>RelaxngDatatype: it represents the actual data type</li><br /> <li>RelaxngDatatypeProvider: it provides the way to get RelaxngDatatype from QName and parameters</li><br /> </ul></p> <p>If you want to implement your own data type, it can be done by extending RelaxngDatatype - especially by implementing Parse(string text, XmlReader context) -, and extending RelaxngDatatypeProvider to return the new datatypes by GetDatatype (string name, string ns, RelaxngParamList parameters). Well, there is already similar <a href="http://sourceforge.net/projects/relaxng">datatype project</a> by Kohsuke Kawaguchi, but I took another way - my RelaxngValidatingReader is not based on different validating context (mine is simply XmlReader).</p> <p>I have many things wanted to add to them, but this time, not yet.</p> http://primates.ximian.com/~atsushi/blog/archives/000352.html Fri, 27 Feb 2004 02:11:42 +0000 it's my turn to drive http://primates.ximian.com/~atsushi/blog/archives/000349.html February 26, 2004 349@http://primates.ximian.com/~atsushi/blog/ <p>This is my first blog entry here.</p> <p>First, I have to introduce myself. I am Atsushi Enomoto, llving in Tokyo. I always write Atsushi Eno (since many Japanese friends call me eno). You will find me usually logging in #mono at my working time (as 'eno').</p> <p>I am working on XML stuff, except for XmlSerializer (Lluis rules). When I joined to the project, XmlReader, XmlWriter, DOM and XPath are already implemented. There are great pioneers. And after a few months, Ben implemented most of the managed XSLT engine. So in fact there is not so many things what I have done. I can say I had implemented XML Schema stuff (well, schema reader and writer had been implemented as well), but the specification is what I basically don't like... :p</p> http://primates.ximian.com/~atsushi/blog/archives/000349.html Thu, 26 Feb 2004 20:16:02 +0000