2006年05月05日
NVDLの規格化が終了
NVDLは、最近新しい実装が公開された。
* http://xmlguru.cz/2006/05/jnvdl-early-access
* http://sourceforge.net/projects/jnvdl/
地味ではあるが、今後のNVDL見通しは明るいと思う。巨大になって身動きが取れなくなりつつあるW3CのHTML関連仕様や、
同じく巨大スキーマとなったODFを救うのはNVDLしかないだろう。
NVDLの最初の構想は、XMLの名前空間を設計している間にすでに
存在した。RELAX Namespace, MNS, NRLと洗練の度を深め、
James Clarkの絶大なる貢献があって完成した仕様である。
記録として、発端となったメールを引用しておく。
----
From: MURATA Makoto
Date: Wed, 04 Mar 1998 00:06:11 +0900
Message-Id: <199803031506.AA00388@murata.apsdc.ksp.fujixerox.co.jp>
To: w3c-xml-sig@w3.org
[This note shows my interpretation of the basic
motivation for the namespace extension. Some part of
this memo might be outside the scope of the namespace
working draft we are going to make, but I believe
that it is important to have an overall picture.]
!21. Introduction
Though XML was designed for rather traditional
documents, XML is now recognized as the information
architecture of the WWW. A number of applications on
the WWW (e.g., metadata-based search engines,
synchronized multi-media viewers, and evaluation of
mathematical expressions) use XML to represent their
information. These applications are expected to
collectively make the WWW a powerful and ubiquitous
computing environment.
For a variety of WWW applications to grow in a
virus-like manner, we have to minimize their
interdependencies. That is, it should be possible to
develop a WWW application without fully understanding
other WWW applications. Ideally, the development of
a WWW application should require no knowledge of
other WWW applications. For example, it would be
nice if developers of metadata-based search engines
do not have to learn multi-media synchronization at
all.
On the other hand, we would like WWW applications to
interwork with each other. For example, a WWW
browser invokes a mathematical expression viewer to
display mathematical expressions within HTML
documents. A metadata-based search engine handles
documents that may contain multi-media
synchronization information. It is an interesting
challenge to provide interworking of WWW applications
while minimizing their interdependencies at the same
time.
To me, this challenge is the basic motivation for the
XML namespace extension. For WWW applications to
interwork with each other, different information
pieces for different WWW applications have to
be combined to form an information amalgam. This
information amalgam is shared and utilized by these
WWW applications.
XML should be able to capture such information
amalgams. Since different WWW applications use the
same name differently, we need a mechanism for
resolving name collision. Thus, namespace.
Name collision is not the only issue, however. I
believe that we should further make XML
non-monolithic. First, WWW application that has only
partial knowledge should be good enough for handling
information amalgams partially. Second,
the result of combining information pieces should
always be valid; that is, validity given by different
WWW applications should collectively ensure validity
of the entire information amalgam. As I see it, both
XML and SGML have been monolithic (*1).
!22. What do I mean by "monolithic"?
In SGML, a DTD author has to understand everything about
complex documents. Although you can download some DTD
modules (e.g., the SGML Open Exchange Table), you have to
know internal details of such DTD modules. For example, you
have to know the top-level element type so that you can
reference to that element type in your content model. You
also have to know other element type names to avoid name
collision. You even have to provide declarations of
parameter entities referenced in the downloaded modules. As
a consequence, DTD authors and DTD authoring tools have to
examine the entire DTD.
The same thing applies to authors of SGML documents. SGML
document editors expose everything to authors. Thus,
authors have to understand the entire DTD; they have to
understand all element types, attributes, and general
entities.
Formatting of SGML documents also requires understanding of
the entire DTD. You cannot write a stylesheet without
understanding some of the element types. (Yes, I am
exaggerating. Some editors and formatters directly support
widely-used table modules, for example. However, I do not
think that the SGML or XML language provides any mechanisms
for encouraging such software tools.)
!23. What do I mean by "non-monolithic"?
You might think I am merely proposing information hiding and
modular programming. More than that, I think. I am rather
proposing autonomous agents.
In the WWW, nobody can understand everything about documents
and data. No software tools can handle everything. Even if
you and your software understand everything currently
available, you and your software tool have to interwork with
those who do not and those software tools which do not.
Therefore, the information architecture for the WWW should
guarantee that partial understanding is good enough for
successfully (but partially) handling complex information.
For example:
* Search engines for a particular RDF schema should be able to handle any XML document if it has an embedded RDF metadata of that schema. They do not have to understand the top-level document structure. Even if documents contain other information such as mathematical expressions and multi-media synchronization, search engines should not have any problems.
* XML browsers that cannot handle RDF and MathML should be able to display XML documents even when they contain RDF data and MathML expressions.
It should be possible to create fragment schemas. A
fragment schema is a description of permissible
document fragments. Fragment schemas free DTD
authors from the burden of understanding everything.
For example:
* MathML is such a fragment schema.
* A RDF schema for embedded metadata is also such a fragment schema.
It should be possible to validate fragment documents
against fragment schemas.
For example:
* It should be possible to validate all mathematical expressions against MathML. The entire DTD should not not be required.
* It should be possible to validate RDF metadata against RDF schema. The entire DTD should not not be required.
* It should be possible to validate top-level document structures (that does not contain fragments for mathematical expressions and metadata) against the top-level DTD.
It should be possible to combine fragment schemas to form a
total schema.
For example:
* It should be possible to combine MathML, a RDF schema, and the top-level DTD to form a total schema.
Ideally, the result of fragment validation (see the namespace
note) should be identical to the result of validation against
the total schema.
For example:
* A document validates against the total schema created from MathML, a RDF schema, and the top-level DTD if and only if:
# mathematical expressions validate against MathML,
# RDF metadata validates against the RDF schema, and
# the top-level document structure validates against the top-level DTD.
!24. Syntax suggestions
Now, I give syntax suggestions for the non-monolithic
approach.
I believe that there is no conflict between colonization and
my suggestions. I also believe that my proposal achieve two
of the requirements in the note "Web Architecture:
Extensible Languages", namely:
* There must be a way of indicating when a given content model may be extended by new schemas.
* There must be a way, in a new schema, of specifying that a given new content model is designed an extension to the existing content model of an existing schema.
!34.1 Fragment schemas and total schemas
A fragment schema is very similar to an external DTD
subset. However, (1) all names of a fragment schema
belong to a single name space, (2) a fragment schema
has one element type identified as the fragment root
type, and (3) content models can reference to
substitution variables. A substitution variable is
merely a parameter entity that is not declared in
this fragment schema. Later, the replacement text of
this parameter entity will be defined as a reference
to the fragment root type of another fragment schema.
Substitution variables also belong to the namespace
of this fragment schema.
We now define the product of two (fragment or
composite) schemas. Consider two schemas, say S1 and
S2. Suppose that S1 contains a substitution
variable, say p1. Then, we can compose a new schema
by combining S1 and S2 at p1; we only have to define
the replacement text for p1 as the fragment root type
of S2. This new schema is called the product of S1
and S2 at p1. A schema thus constructed is said to
be composite.
Typically, the namespace(s) of S1 and the namespace(s) of S2
are disjoint. The namespaces of the product is the union
of those of S1 and those of S2.
By repeatedly creating products of schemas, we can construct
a total schema.
The use of substitution variables might look ad-hoc, but it is
not. It is directly based on the forest regular language
theory. The product of forest regular languages is a forest
regular language.
!34.2 Documents and fragments
Within a single document, we allow elements of
different namespaces. However, this document
is a loosely coupled fragments, each of which
belong to a single namespace.
An element is said to be a fragment root if this
element and its parent element belong to different
namespaces. As a special case, the root of the
document is also a fragment root.
A fragment root must explicitly reference to a
fragment schema and a substitution variable. This
substitution variable does not belong to the namespace
of this fragment root, but rather belongs to the
namespace of its parent element.
Now, we decompose a single document into document
fragments. For each fragment root (except the
document root), we make its parent element reference
to the corresponding substitution variable. In other
words, we first detach the fragment root from its
parent element; we then make the parent element reference
to the substitution variable attached to this fragment
root. By repeatedly doing so, we obtain a collection
of document fragments.
All elements in a document fragment belong to a single
namespace. We can always reconstruct the original
document by replacing each substitution variable with
the corresponding fragment root. In this sense, this
collection of document fragments is equivalent to the
original document.
!34.3 Fragment validation
Now, we are ready to introduce fragment validation. By
repeatedly performing fragment validation, we can examine the
validity of the entire document.
For each fragment within a document, we validate it
against a fragment schema that is referenced by the
root of this fragment. This validation is very
similar to traditional validation. The only
difference is handling of substitution variables. A
substitution variable in a fragment schema matches a
substitution variable in a fragment if and only if
they are of the same name.
Obviously, a document is valid against a total schema
if and only if every fragment within this document
is valid against the corresponding fragment schema.
Fragment validation is also inspired from the theory of
forest regular languages. A forest is accepted by the
product of two forest regular languages if and only if this
forest is decomposed into the product of two fragments such
that they are accepted by the two forest regular languages
respectively.
!35. Conclusion
I have presented my view on XML as the information
architecture of the WWW. To provide interworking of
WWW applications while minimizing their
interdependencies, I believe that we have to make XML
non-monolithic. Partial knowledge should be good
enough for partially handling information amalgams;
fragment validation collectively ensure validity of
entire documents. I have made some syntax
suggestions, which are directly inspired by a the
forest-regular language theory.
Certainly, this note is controvertial. Some part of this
proposal are outside the scope of the namespace extension
and should probably be left to the schema extension.
However, I hope that fundamental requirements (rather than
concrete mechanism) for namespaces are thoroughly considered
and that the XML namespace extension is designed on the basis
of good understanding of such fundamental requirements.
(*1) Yes, SGML has a mechanism called SUBDOC. Is it already
good enough for the XML namespace? We have to discuss about
this. However, I think that it is insufficient. Tables
that reference to the top-level document components cannot
be captured by SUBDOC.
[Wed, 04 Mar 1998 00:01:27 +0900]
Makoto
Fuji Xerox Information Systems
* http://xmlguru.cz/2006/05/jnvdl-early-access
* http://sourceforge.net/projects/jnvdl/
地味ではあるが、今後のNVDL見通しは明るいと思う。巨大になって身動きが取れなくなりつつあるW3CのHTML関連仕様や、
同じく巨大スキーマとなったODFを救うのはNVDLしかないだろう。
NVDLの最初の構想は、XMLの名前空間を設計している間にすでに
存在した。RELAX Namespace, MNS, NRLと洗練の度を深め、
James Clarkの絶大なる貢献があって完成した仕様である。
記録として、発端となったメールを引用しておく。
----
From: MURATA Makoto
Date: Wed, 04 Mar 1998 00:06:11 +0900
Message-Id: <199803031506.AA00388@murata.apsdc.ksp.fujixerox.co.jp>
To: w3c-xml-sig@w3.org
[This note shows my interpretation of the basic
motivation for the namespace extension. Some part of
this memo might be outside the scope of the namespace
working draft we are going to make, but I believe
that it is important to have an overall picture.]
!21. Introduction
Though XML was designed for rather traditional
documents, XML is now recognized as the information
architecture of the WWW. A number of applications on
the WWW (e.g., metadata-based search engines,
synchronized multi-media viewers, and evaluation of
mathematical expressions) use XML to represent their
information. These applications are expected to
collectively make the WWW a powerful and ubiquitous
computing environment.
For a variety of WWW applications to grow in a
virus-like manner, we have to minimize their
interdependencies. That is, it should be possible to
develop a WWW application without fully understanding
other WWW applications. Ideally, the development of
a WWW application should require no knowledge of
other WWW applications. For example, it would be
nice if developers of metadata-based search engines
do not have to learn multi-media synchronization at
all.
On the other hand, we would like WWW applications to
interwork with each other. For example, a WWW
browser invokes a mathematical expression viewer to
display mathematical expressions within HTML
documents. A metadata-based search engine handles
documents that may contain multi-media
synchronization information. It is an interesting
challenge to provide interworking of WWW applications
while minimizing their interdependencies at the same
time.
To me, this challenge is the basic motivation for the
XML namespace extension. For WWW applications to
interwork with each other, different information
pieces for different WWW applications have to
be combined to form an information amalgam. This
information amalgam is shared and utilized by these
WWW applications.
XML should be able to capture such information
amalgams. Since different WWW applications use the
same name differently, we need a mechanism for
resolving name collision. Thus, namespace.
Name collision is not the only issue, however. I
believe that we should further make XML
non-monolithic. First, WWW application that has only
partial knowledge should be good enough for handling
information amalgams partially. Second,
the result of combining information pieces should
always be valid; that is, validity given by different
WWW applications should collectively ensure validity
of the entire information amalgam. As I see it, both
XML and SGML have been monolithic (*1).
!22. What do I mean by "monolithic"?
In SGML, a DTD author has to understand everything about
complex documents. Although you can download some DTD
modules (e.g., the SGML Open Exchange Table), you have to
know internal details of such DTD modules. For example, you
have to know the top-level element type so that you can
reference to that element type in your content model. You
also have to know other element type names to avoid name
collision. You even have to provide declarations of
parameter entities referenced in the downloaded modules. As
a consequence, DTD authors and DTD authoring tools have to
examine the entire DTD.
The same thing applies to authors of SGML documents. SGML
document editors expose everything to authors. Thus,
authors have to understand the entire DTD; they have to
understand all element types, attributes, and general
entities.
Formatting of SGML documents also requires understanding of
the entire DTD. You cannot write a stylesheet without
understanding some of the element types. (Yes, I am
exaggerating. Some editors and formatters directly support
widely-used table modules, for example. However, I do not
think that the SGML or XML language provides any mechanisms
for encouraging such software tools.)
!23. What do I mean by "non-monolithic"?
You might think I am merely proposing information hiding and
modular programming. More than that, I think. I am rather
proposing autonomous agents.
In the WWW, nobody can understand everything about documents
and data. No software tools can handle everything. Even if
you and your software understand everything currently
available, you and your software tool have to interwork with
those who do not and those software tools which do not.
Therefore, the information architecture for the WWW should
guarantee that partial understanding is good enough for
successfully (but partially) handling complex information.
For example:
* Search engines for a particular RDF schema should be able to handle any XML document if it has an embedded RDF metadata of that schema. They do not have to understand the top-level document structure. Even if documents contain other information such as mathematical expressions and multi-media synchronization, search engines should not have any problems.
* XML browsers that cannot handle RDF and MathML should be able to display XML documents even when they contain RDF data and MathML expressions.
It should be possible to create fragment schemas. A
fragment schema is a description of permissible
document fragments. Fragment schemas free DTD
authors from the burden of understanding everything.
For example:
* MathML is such a fragment schema.
* A RDF schema for embedded metadata is also such a fragment schema.
It should be possible to validate fragment documents
against fragment schemas.
For example:
* It should be possible to validate all mathematical expressions against MathML. The entire DTD should not not be required.
* It should be possible to validate RDF metadata against RDF schema. The entire DTD should not not be required.
* It should be possible to validate top-level document structures (that does not contain fragments for mathematical expressions and metadata) against the top-level DTD.
It should be possible to combine fragment schemas to form a
total schema.
For example:
* It should be possible to combine MathML, a RDF schema, and the top-level DTD to form a total schema.
Ideally, the result of fragment validation (see the namespace
note) should be identical to the result of validation against
the total schema.
For example:
* A document validates against the total schema created from MathML, a RDF schema, and the top-level DTD if and only if:
# mathematical expressions validate against MathML,
# RDF metadata validates against the RDF schema, and
# the top-level document structure validates against the top-level DTD.
!24. Syntax suggestions
Now, I give syntax suggestions for the non-monolithic
approach.
I believe that there is no conflict between colonization and
my suggestions. I also believe that my proposal achieve two
of the requirements in the note "Web Architecture:
Extensible Languages", namely:
* There must be a way of indicating when a given content model may be extended by new schemas.
* There must be a way, in a new schema, of specifying that a given new content model is designed an extension to the existing content model of an existing schema.
!34.1 Fragment schemas and total schemas
A fragment schema is very similar to an external DTD
subset. However, (1) all names of a fragment schema
belong to a single name space, (2) a fragment schema
has one element type identified as the fragment root
type, and (3) content models can reference to
substitution variables. A substitution variable is
merely a parameter entity that is not declared in
this fragment schema. Later, the replacement text of
this parameter entity will be defined as a reference
to the fragment root type of another fragment schema.
Substitution variables also belong to the namespace
of this fragment schema.
We now define the product of two (fragment or
composite) schemas. Consider two schemas, say S1 and
S2. Suppose that S1 contains a substitution
variable, say p1. Then, we can compose a new schema
by combining S1 and S2 at p1; we only have to define
the replacement text for p1 as the fragment root type
of S2. This new schema is called the product of S1
and S2 at p1. A schema thus constructed is said to
be composite.
Typically, the namespace(s) of S1 and the namespace(s) of S2
are disjoint. The namespaces of the product is the union
of those of S1 and those of S2.
By repeatedly creating products of schemas, we can construct
a total schema.
The use of substitution variables might look ad-hoc, but it is
not. It is directly based on the forest regular language
theory. The product of forest regular languages is a forest
regular language.
!34.2 Documents and fragments
Within a single document, we allow elements of
different namespaces. However, this document
is a loosely coupled fragments, each of which
belong to a single namespace.
An element is said to be a fragment root if this
element and its parent element belong to different
namespaces. As a special case, the root of the
document is also a fragment root.
A fragment root must explicitly reference to a
fragment schema and a substitution variable. This
substitution variable does not belong to the namespace
of this fragment root, but rather belongs to the
namespace of its parent element.
Now, we decompose a single document into document
fragments. For each fragment root (except the
document root), we make its parent element reference
to the corresponding substitution variable. In other
words, we first detach the fragment root from its
parent element; we then make the parent element reference
to the substitution variable attached to this fragment
root. By repeatedly doing so, we obtain a collection
of document fragments.
All elements in a document fragment belong to a single
namespace. We can always reconstruct the original
document by replacing each substitution variable with
the corresponding fragment root. In this sense, this
collection of document fragments is equivalent to the
original document.
!34.3 Fragment validation
Now, we are ready to introduce fragment validation. By
repeatedly performing fragment validation, we can examine the
validity of the entire document.
For each fragment within a document, we validate it
against a fragment schema that is referenced by the
root of this fragment. This validation is very
similar to traditional validation. The only
difference is handling of substitution variables. A
substitution variable in a fragment schema matches a
substitution variable in a fragment if and only if
they are of the same name.
Obviously, a document is valid against a total schema
if and only if every fragment within this document
is valid against the corresponding fragment schema.
Fragment validation is also inspired from the theory of
forest regular languages. A forest is accepted by the
product of two forest regular languages if and only if this
forest is decomposed into the product of two fragments such
that they are accepted by the two forest regular languages
respectively.
!35. Conclusion
I have presented my view on XML as the information
architecture of the WWW. To provide interworking of
WWW applications while minimizing their
interdependencies, I believe that we have to make XML
non-monolithic. Partial knowledge should be good
enough for partially handling information amalgams;
fragment validation collectively ensure validity of
entire documents. I have made some syntax
suggestions, which are directly inspired by a the
forest-regular language theory.
Certainly, this note is controvertial. Some part of this
proposal are outside the scope of the namespace extension
and should probably be left to the schema extension.
However, I hope that fundamental requirements (rather than
concrete mechanism) for namespaces are thoroughly considered
and that the XML namespace extension is designed on the basis
of good understanding of such fundamental requirements.
(*1) Yes, SGML has a mechanism called SUBDOC. Is it already
good enough for the XML namespace? We have to discuss about
this. However, I think that it is insufficient. Tables
that reference to the top-level document components cannot
be captured by SUBDOC.
[Wed, 04 Mar 1998 00:01:27 +0900]
Makoto
Fuji Xerox Information Systems
この記事へのトラックバックURL
http://ch05250.kitaguni.tv/t256689
※このエントリーではブログ管理者の設定により、ブログ管理者に承認されるまでコメントは反映されません


