Metadata and Metamodeling

XML as a Modeling Language

Textual data is a major source of information. An important part of our intellectual development depends on it. One way of conceiving knowledge is through reading. The more effective the presentation of reading material, the more successful the results will be. To increase the understanding of reading materials, scientists, authors, publishers and many more developed a great amount of techniques. Different type faces like Times, Bookman, and Arial; styles like bold, italic, and underlined are all samples of such innovations.

The idea behind styles and faces is to distinguish each piece of information from others. Highlighting some text with styles in a document helps human eyes to distinguish information; however, when there are no means of imposing a style on text, such as typing a document using a typewriter or a teletypewriter, other means are needed. Markup languages are one of the answers to this need.

A markup language is a set of specifications that describes how logical components of a document will be separated from others by using words, abbreviations or some other character combinations whose syntaxes are defined in a separate document. These identified parts can be extracted easily by either human beings or computer applications. As a specification language for markup languages, Standardized and Generalized Markup Language (SGML) [65, 47] was developed as an international standard (ISO 8879) in 1986.

SGML was initially developed to bring a consensus between text processing applications so that electronic documents with markups could be exchanged and semantics given to these markups understood correctly. Even though an infinite number of markup languages could be designed in SGML, development and processing costs would be very high. Around mid-1990s, desktop publishing tools with SGML support cost about four times more than those without one [53]. SGML served its purpose within a small community up until 1991 when one of its applications, Hypertext Markup Language (HTML), was developed by Tim Berners-Lee [60].

Initially at CERN, European Organization for Nuclear Research, HTML was used in technical documents to provide an access mechanism through a document server, the first Web server, to access internal documents. The document server processed document requests, and responded to them with document content. This idea, which was later supported with a graphical interface and user interactivity, brought the explosive growth of the Internet usage and the World Wide Web (WWW).

HTML initially defined the document structure. However, presenting documents in graphical environments led application developers to add additional features to browsers and document editors. To answer the demand, specification authors had to add new HTML elements into the specification, which offered solutions to problems in presentation more than document structure. Extracting data within HTML pages became very difficult if not impossible.

Complexity in SGML processing and the limited capabilities of HTML required further studies in markup languages. The SGML working group at WWW Consortium (W3C) finally came up with a solution that inherits the power and extensibility of SGML while being simple and cost effective in terms of development and usage. The new specification was called Extensible Markup Language (XML) [112, 53].

XML has combined the advantages of both of its predecessors. Its simplicity in design, use and development has favored its use mostly in Web and information technologies.

   Structured Documents
   Freedom in Language Design
   Language for Specifications
   Data Presentation
   Data Persistence
   Document and Object Modeling