Structured Documents

Metadata and Metamodeling

Structured Documents

Markup languages bring a lot of advantages into document authoring and publishing. They are easy to learn and edit, and simple enough to write computer applications very quickly to parse and process documents. They break a document into its logical pieces where individual pieces can be extracted unambiguously. This allows more intelligent searches over documentation where users can specify the type of information they look for using language elements. For example, queries for titles would be made for only the content within <title> and </title> tags.

HTML has many problems. First, it allows developers to skip closing tags. Another weakness in the HTML specification is that some tags are not bound to their parent tags, and can be used anywhere freely. Also, browsers or other HTML rendering software seem to ignore such errors. These let HTML authors produce many ill-structured documents resulting in a loss in search granularity.

The XML specification is designed to answer these issues. It is very strict on syntax and structure. This simplifies writing parsers. Developers do not need to write a different and complex parser for each XML language they adopt. There are many freely available parsers that can easily be embedded in development cycles. [REF?]