Saturday 6 November 2010

What is the importance of writing the first line when writing XML?


An XML declaration takes the following form:

<?xml version 
                opt._encoding 
                opt._standalone?>

The three key bits of the declaration are what some call pseudo-attributes, because they look syntactically similar to attributes. If present, the encoding declaration must follow the version, and, if present, the standalone declaration must be the last pseudo-attribute.


Declaring the XML version is especially important now that XML 1.1 has been approved as a W3C Recommendation. XML 1.1 changes the definition of well-formedness in small but definite ways. One nice change is that XML 1.1 makes the XML declaration mandatory. The recommendation states:
XML 1.1 documents MUST begin with an XML declaration which specifies the version of XML being used.
By definition, any XML document without a declaration is an XML 1.0 document. However, you should never leave the version unstated, especially since it is also very important to specify the encoding.


The foundation of XML is Unicode. Every character in an XML document is a Unicode character. If you were to remember only one fact about XML, this would be the one to choose. It's even more important than, say, the fact that all non-empty elements must have an opening and closing tag. Since a Unicode character is an abstraction, there must be a mechanism for actually representing these characters in a form that can be processed by computers. This form is called an encoding

The encoding of the document is only a convenience for transmitting the document, but you should understand clearly that the substance of the XML content is still strictly Unicode. It's the parser's job to translate from the encoding to Unicode.

The most common encodings are UTF-8 and UTF-16, which transmit Unicode characters as a sequence of 8-bit and 16-bit values, respectively. These are also the two encodings that must be supported by parsers. If you do not specify an encoding, an XML processor must assume UTF-8 or UTF-16 depending on the presence or absence of a special byte sequence (called the Byte Order Mark or BOM) at the very beginning of the file being parsed.

Standalone Declaration

The standalone declaration indicates whether a document relies on information from an external source, such as external document type definition (DTD), for its content. If the standalone declaration has a value of "yes", for example, <?xml version="1.0" standalone="yes"?>, the parser will report an error if the document references an external DTD or external entities.Leaving out the standalone declaration produces the same result as including a standalone declaration of "no". The XML parser will accept external resources, if there are any, without reporting an error.

No comments:

Post a Comment