April 29, 2004

Cannot DataSet be read from streaming XmlReader?

DataSet.ReadXml() is still hot - here "hot" does not mean any good, especially confronting upcoming beta1 release.

In the comment to my blog post, I was asked if we can provide XmlDocument-less DataSet.ReadXml(). I have been developing anothe implementation , but I had no certain answer about that question. However, I found one problematic design in MS.NET.

MS.NET infers a data columns in a certain order. It infers:


  1. A Hidden primary key column: If the table element contains another table element

  2. Element columns: that will be either a Hidden reference column, or a simple type element. They are in appeared order (i.e. unorderd by their ColumnMapping types).
  3. Finally Attribute columns

This instance will assert my inference above.

Then, there is a problem. Because


  1. Attributes can be read only when the XmlReader is on startElement

  2. We cannot fill a datarow for detached column

  3. We cannot change column orders (at least without removal of the existing column content for each row)

So if the column order is their "design", then Microsoft's ReadXml() implementation is bound to DOM.

Then what shuold we do? Two ideas occured to me: A) ignore such column orders (or addint column order replacement internally), or B) separate inference engine and data loading engine. My current implementation is A) but I think B) would be better since it won't be complicated. Moreover, there are many System.Data hackers so that I don't want to impact on the basic class changes
except for xml support stuff. However, I have no idea if we have enough time to develop B)... I have many things that should be fixed such as ReadXmlSchema(), TypedDataSetGenerator, XmlDataDocument, SqlTypes etc.

Posted by atsushi at 09:55 AM | Comments (1)

April 26, 2004

"defining the game" in Japanese

You can read Miguel's "defining the game" in Japanese.

Today Gert Driesen of NDoc team told me that he could manage to run NDoc (NDocConsole.exe) on Mono and he could create MSDN docs without any problem. Am glad to hear that. I have many thanks to him for his helpful advice to manage to run NAnt (to build NDoc; he is also in NAnt team), and many nice reports on bugzilla.

Now I'm still on the ADO.NET way. Today I checked the first TypedDataSetGenerator implementation into cvs. It is still incomplete yet, but with that code, we can now generate typed dataset class, the last missing large part of xsd.exe. It will be improved in a cpl days I hope.

Posted by atsushi at 07:18 PM | Comments (0)

April 22, 2004

the last samurai

OK, time to be happy with Rupert.

Posted by atsushi at 06:24 AM | Comments (0)

the dark side of the moon

Just a note, I found that in the recent article about mono on osnews, there was a mention about XmlDocument.GetElementById(). There has been description about it (why it is incomplete). GetElementById() is one of the troublesome part of the DOM specification (and I don't like it) and I don't feel I had better dig into it. If you still think that method is "the most important" as the author says, you don't have to hesitate to use it... I think it is as stable as Microsoft.NET.

I have been frustrated that there is an proposal to change copyright law from Japanese Agency for Cultural Affairs in Japan. In the draft, they required exclusive import entitlement for CDs. They had been explained that "the law does not means imported CDs are prohibited in Japan."
But one Japanese attorney pointed out that with that law any foreign lebels can "prohibit" specific import of CDs, and thus Amazon might be regarded as "illegal" in Japan (since there is "no-discount resale system" in Japan, many people prefer Amazon to buy CDs. That system have long been criticized by most of people except for publishing companies).
After this pointout, the agency changed their explaination that "The lebels promised us not to prohibit import." However, recently it is said that there are many copy of "excerpted comment from RIAA" (not sure if it is true) on the net, saying that:

We thus strongly caution against the adoption of discriminatory legislation, and call upon the Government to provide rights to control importation with regard to all repertoire.

The law is likely to pass the Congress, while I haven't seen any people including academic persons who agree to that draft, except for the agency and concerned associations. And sadly to say, the iron-triangle is so strong here.

I also found a potentially sad thing that MS ADO.NET uses XmlDocument in DataSet.ReadXml(). I wonder if the way is required to implement that method... it is a big waste of memory to load the entire document to memory.

Posted by atsushi at 06:20 AM | Comments (3)

April 16, 2004

How is the mapping from xsd to DataSet done?

I am trying to rewrite DataSet.ReadXmlSchema(). There had been great prior art named XmlSchemaMapper by Ville Palo, but since at the time he wrote up that class, there was no complete XML Schema stack, so he had to have a hard time to implement it without Post Schema Compilation Information. So I thought it is time to improve them and simplify the class. Actually, I'm on creating another class.

I began with the analysis of ReadXmlSchema() behavior. Actually, Microsoft had written Generating DataSet Relational Structure from XML Schema (XSD), a detailed documentation about DataSet and XML interoperability. So you will be content with that doc. Here is another attempt to describe "how are XML schemas consumed by ReadXmlSchema()". I will implement the class based on ths analysis. Comments are applicable.

Targetable Schema Components

Only global global elements that hold complex type are converted into a table. The components of the type of the element are subsequently converted into a table, BUT there is an exception. As for "DataSet elements", the type is just ignored (see "DataSet Element definition" below).

Unused complex types will never be converted.

Global simple types and global attributes are never converted. They cannot be a table. Local complex types are also converted into a table.

Local elements are converted into either a table or a column in the "context DataTable".

Name Convention (incomplete)

Since local complex types are anonymous, we have to name for each component. Thus, and since complex types and elements can have the same name each other, we have to manage a table for mappings from a name to a component. The names must be also used in DataRelation definitions correctly.

DataSet element definition

"DataSet element" is such element that has an attribute msdata:IsDataSet (where prefix "msdata" is bound to urn:schemas-microsoft-com:xml-msdata).

Only the first global element that matches the condition above is regarded as DataSet element (by necessary design or just a bug?) instead of handling as an error.

All global elements are considered as an alternative in the dataset element.

For local elements, msdata:IsDataSet are just ignored.

Importing Complex Types

When an xs:element is going to be mapped, its complex type (remember that only complex-typed elements are targettable) are expanded to DataColumn.

DataColumn has a property MappingType that shows whether this column came from attribute or element.

[Question: How about MappingType.Simple? How is it used?]

Additionally, for particle elements, it might also create another DataTable (but for the particle elements in context DataTable, it will create an index to the new table).

For group base particles (XmlSchemaGroupBase; sequence, choice, all) each component in those groups are mapped to a column. Even if you import "choice" or "all" components, DataSet.WriteXmlSchema() will output them just as a "sequence".

Identity Constraints and DataRelations

Only constraints on "DataSet element" is considered. All other constraint definitions are ignored. Note that it is DataSet that has the property Relations (of type DataRelationCollection).

xs:key and xs:unique are handled as the same (then both will be serialized as xs:unique).

The XPath expressions in the constraints are strictly limited; they are expected to be expandable enough to be mappable for each

  • selector to "any_valid_XPath/is/OK/blah" where "blah" is one of the DataTable name. It looks that only the last QName section is significant and any heading XPath step is OK (even if the mapped node does not exist).

  • field to QName that is mapped to DataColumn in the DataTable (even ./QName is not allowed)

Posted by atsushi at 06:42 PM | Comments (1)

April 04, 2004

Nothing more than scribble

Todd made cool gettext support for MD:

(Of course, it is not him who created this incomplete ja_JP.po file :-)

--------

Well, after tberman's another set of gettext hacking, I could get another shot, with updated .po file.

Note that basically translation should not be done against heavily ongoing works; I was playing only 1 to 1.5 hours with it. But it also means that when some releases will be done, it won't require so much time ;-)

Posted by atsushi at 12:58 PM | Comments (2)