XML: A Primer

You have probably heard a great deal about eXtensible Markup Language (XML) over the
past few years. XML is on its way to becoming the de facto language for communications
between devices, Web browsers, computers, servers, and applications. In time, any two
applications will be able to exchange information without ever having been designed to talk
to each other.
In many ways, XML is just another file format—one more way to store information.
However, XML as a file format is just the beginning. XML promises to liberate information
from proprietary file formats and make it possible for information to move among multiple
programs on different types of computers without facing the battery of conversion programs
and lost information that is currently necessary. XML promises to dramatically increase both
the efficiency and flexibility of the ways in which you handle information. In doing so, XML
will have an impact on the way in which you use computers; it will change the way you
look at applications.
Fundamentally, XML makes it easy to store information in a hierarchical format, providing a
consistent, easy-to-parse syntax and a set of tools for building rules describing the structure
used to contain information. The XML format can represent both simple and complex information,
and allows developers to create their own vocabularies for describing that information.
XML documents can describe both themselves and their content.
The XML Design Specs
When you think of an “application,” you tend to think of a Web application — or a desktop
application like Word or Excel. However, the creators of XML were a little less nearsighted
many different kinds of applications. For this reason, the creators of XML established the
following design commandments for the XML specification:
1. XML shall be straightforwardly usable over the Internet.
This does not mean that XML should only be used over the Internet, but rather
that it should be lightweight and easily usable over the Internet.
2. XML shall support a wide variety of applications.
The idea here is that XML should not be application specific. It can be used over
the Internet or in a traditional client/server application. There is no specific technology
behind XML, so any technology should be able to use it.
3. It shall be easy to write programs that process XML documents.
Unable to gain wide acceptance for various reasons, many technologies come and
go. A major barrier to wide acceptance is a high level of difficulty or complexity.
The designers of XML wanted to ensure that it would gain rapid acceptance by
making it easy for programmers to write XML parsers.
4. XML documents should be human-legible and reasonably clear.
Because XML is text-based and follows a strict but simple formatting methodology,
it is extremely easy for a human to get a true sense of what a document means.
XML is designed to describe the structure of its contents.
5. XML documents shall be easy to create.
XML documents can be created in a simple text-editor. Now that’s easy!
There are other XML guidelines, but since this only is an introduction to XML, these will
do for now. The important thing to remember is that XML is simply a file format that can be
used for two or more entities to exchange information.
XML documents are hierarchical: they have a single (root) element, which may contain
other elements, which may in turn contain other elements, and so on. Documents typically
look like a tree structure with branches growing out from the center and finally terminating
at some point with content. Elements are often described as having parent and child relationships,
in which the parent contains the child element.
The Structure of XML Documents
XML documents must be properly structured and follow strict syntax rules in order to work
correctly. If a document is lacking in either if these areas, the document can’t be parsed.
There are two types of structures in every XML document: logical and physical. The logical
structure is the framework for the document and the physical structure is the actual data.
An XML document may consist of three logical parts: a prolog (optional), a document element,
and an epilog (optional). The prolog is used to instruct the parser how to interpret
the document element. The purpose of the epilog is to provide information pertaining to the
preceding data. Listing 6-1 shows the basic structure of an XML document.
Listing 6-1 Basic structure of an XML document





Hootie And The Blowfish


Darius
Rucker


Dean
Felber


Mark
Bryan


Jim
Sonefeld






The prolog is made up of two parts: the XML declaration and an optional Document Type
Declaration (DTD). The XML declaration identifies the document as XML and lets the parser
know that it complies with the XML specification. Although the prolog, and thereby the
XML declaration, is optional, we recommend that you include them in all your XML documents.
Here is an example of a simple XML declaration:

The XML declaration can also contain more than just the version attribute. Some of the
more important ones are the encoding and standalone attributes.
The document type declaration establishes the grammar rules for the document or it
points to a document where these rules can be found. The DTD is optional, but, if included,
must appear after the XML declaration.
XML documents can also reference a Schema rather than a DTD. Schemas perform essentially
the same function as DTDs, but can describe more complex data types and are actually
XML documents themselves. When possible, we recommend using a Schema rather than a
DTD as Schemas are quickly becoming the de-facto standard for describing XML documents.
An XML document is referred to as well formed when it conforms to all XML
syntax rules. A valid XML document follows the structural rules defined in a
Document Type Definition or Schema.
All the data in an XML document is contained within the document element (in this
example, ). You can’t have more than one document element in the same document,
but the document element can contain as many child elements as necessary.

0 comments: