Metadata and Metamodeling

Metadata Statements and RDF

In the Dublin Core or similar models, attribute-value pairs are the major components. An attribute defines a feature or a characteristic of an object, such as color, height, usage, and name; and the value defines the object’s relative information among the other objects with respect to the type of the attribute. In this representation of attribute-value relation, objects or resources are neither defined nor identified. Attribute-value pairs are assumed to be defining an object or a resource that contains them. Figures 2.6 and 2.7 depict two different HTML documents that we have no information about their identities. They are just populated with metadata elements. Resource Description Framework (RDF)[113] is a specification that comprises objects, their attributes and values in forms of RDF documents.

RDF documents are containers for RDF statements that build associations between resources. A RDF statement is composed of a subject, a predicate, and an object, as illustrated in figure 2.8(a), which can be realized in a natural language as the following sentence: the subject has the predicate with the value object. It is this triple that gives RDF its power and popularity. Using these triples, any model that can be represented as directed labeled graphs can be expressed in RDF language as well. Figure 2.8(b) depicts a sample tree model while the figure 2.8(c) shows an instantiation of the same model with real values. In a RDF statement, an object can be the subject of another statement as for Fred in the example. Fred is a student of a Class, however, it is also the owner (subject) of the email property. Class can also have other predicates such as Room, Capacity and Teacher.

In addition to having triples or statements to represent relationships, another strength of RDF lies in its definitions of resources with unique identifiers. Each resource is identified by a Unique Resource Identifier (URI) [15] that is a globally accepted identification specification. Since a URI represents a resource, several RDF statements may use the same URI to populate metadata for the resource the URI represents. Figure 2.9(a) demonstrates this association. The resource, in this case a personal Web page, is identified by a URI, The RDF statement defines its creator as Ora Lassila. However, the person who created the home page may have other attributes (or properties), other then his or her name, such as email address, salary information, etc. The model in the example could be further extended by identifying the object as another resource and assigning it a new URI as well. Now, a real object is represented in the model, and new metadata can be populated about it. The figures 2.9(b) and 2.9(c) demonstrate two different forms of representation, the first with an anonymous resource at the center with no URI associated, and the second with a URI. The latter allows the employee object at the center to be referred in other RDF documents.

RDF statements are advanced forms of populating resources with metadata that we have shown an example of in figure 2.7 with a XHTML page having a DC metadata element, <dc:creator>Ora Lassila</dc:creator>. The Uniform Resource Locators (URL) [17] are a specialized subset of URIs. Instead of adding <Meta> tags into Web pages, if we create RDF statements, based on the URLs of Web pages, metadata definitions of these resources will be semantically identical. In other words, embedding metadata into a Web page with a fixed URL has the same meaning as creating a separate document and associating the metadata with the URL inside that separate document. Differences arise in the form and accessibility of the metadata.

XML is the recommended language for serializing RDF statements in forms of verbose documents. The advantages of XML as discussed previously allow RDF documents to be widely and easily deployed, processed, and exchanged between metadata management systems. Figure 2.10(a) is a sample XML document where the RDF statement in figure 2.9(a) is serialized. XML documents can also serve as containers for RDF statements. Several RDF statements can be listed within the same XML document where resources are described based on their URIs as in figure 2.10(b). Such collections of RDF statements are becoming commonly available from many organizations that serve as information repositories such as libraries, news agencies, or on-line stores with great numbers of product collections. Repositories that spread out over the Internet form a global distributed metadata network which needs to have methods of better indexing and accessing information available for human and machine use.

As RDF statements build relationships between resources on the Web, a need for standardization emerges to define the types of relationships between and properties of resources. One can automate intelligent document searches that evaluate documents’ metadata and relationships to other documents using the semantics well-defined and accepted by greater communities. The RDF Schema (RDFS) [24] specification serves this purpose with language structures for resource and property classifications, class hierarchy definitions, literals, labeling, and so on. In other words, RDF Schema allows developers to define their own languages for RDF and to validate serialized RDF documents.

The Semantic Web

The Semantic Web [16] is an effort to make information access more efficient for people through intelligent search mechanisms. Content developers are expected to generate metadata for the content on their sites following the guidelines and standards the WWW Consortium releases. All the studies and standards toward this direction form this new type of Web concept. These standards will help
  • language developers to develop more expressive languages to represent semantics of document relationships and domains of document classes,
  • information technologists to model information structures in accordance with the world standards,
  • content developers to classify their data so they are reachable by the right set of users, and
  • users can initiate more intelligent searches using these additional information on the Web content.

RDF and RDFS specifications form the basis for expressive semantic languages. The DARPA Agent Markup Language (DAML) [54] is an extension to XML and RDF that aims at developing a language and tools to facilitate the concept of the Semantic Web. DAML, joined with the Ontology Inference Layer (OIL) [58], DAML+OIL [57], is a semantic markup language that serves as the schema or ontology language for Web resources. This means that relationships between resources are defined with a more powerful language than RDF, and resources are grouped by ontologies so that any person who searches information on the Internet can perform more refined searches. For example, a search for a term like element in the realm of chemistry, would only return resources in accordance with the use of the word element inchemistry instead of physics or information technology.