Metadata Repositories

Metadata and Metamodeling

Metadata Repositories

In the previous sections, we have described several well-known methodologies to define and populate metadata for the Web. Data structures that new metadata specifications define (i.e. tree structure in XML, RDF statements forming relationship graphs) require special attention to be paid to data analysis, modeling, and management. Traditional data repository systems may still be utilized, however, new services and extended features are often required.

The nature of information on the Internet and the intranets of large enterprises enforces information system developers to bring forth distributed data management solutions to handle the problem of defining large amount of online information with metadata. As a result, distributed metadata solutions are built as multi-tier architectures enriched with distinct services and supported by a wide array of data stores over networks. Key features shared by these metadata systems appear to be [76]:

flexible metamodels that developers or users can integrate with their own models,
universal resource and metadata naming standards,
mechanisms to handle multiple metadata sources,
tools (i.e. user or programming interfaces) that help generate and modify metadata manually or automatically,
data persistence with search and retrieval abilities, and
security management.

Figure 2.12 depicts a common architecture for metadata and content repository systems. The architecture is divided into five tiers of abstraction: clients, applications, services, data retrieval and data stores.

Clients. Client types for such repositories vary from users with different device capabilities to other application servers. Applications detect device types and adjust content rendering accordingly. For the architecture, other metadata repository systems are also clients that can directly access to systems services.

Applications. Applications sit at the top of repository services. For example, a content management system (CMS) can encompass necessary interfaces and services to build a comprehensive application to manage distributed enterprise-wide content. Similarly, an electronic notebook can serve as data and a metadata repository application for the scientific record. With specialized intermediary applications, repository services also allow a repository to be seamlessly integrated into a network of other repositories and applications such as by advertising itself as a node in a peer-to-peer network, or by subscribing to a publish/subscribe messaging system.

Services. In the services tier, a modeling service allows administrators, or in some cases users, to design metadata models dynamically and integrate them into the system instantly. Dynamic modeling is not supported by a wide range of repository systems, however, it becomes one of the important services as XML use with its flexibility in modeling and the demand for automated metadata authoring interfaces increase very rapidly [14].

A discovery service in a repository system locates a resource or a metadata record within data stores by using a unique ID. This is either a URI, or a key in any other naming format. Discovery operations are performed either locally or over the network by accessing other repositories.

An authoring service is the interface where human clients generate or alter metadata about any content. Any existing metadata is retreived through the discovery service, and the authoring service interacts with users through an appropriate interface, which may be created by the modeling service previously.

A rendering service could serve for both metadata and content to transform data, if necessary, from one format into an appropriate format for users’ device capabilities or data formats that client applications require.

A search service performs user queries over metadata and/or content. Search requests can also be spread over other repositories as well.

A security service controls client accesses to resources and services from clients. Security restrictions can be hierarchical at resource level as in traditional file systems or even further in detail in terms of document fragments where metadata is generated collaboratively.

A versioning system is very much required by metadata repository systems mainly due to frequent changes in specifications and models, and for the support of extensibility. Since there are many metadata standards under development, new versions of specifications are released within short periods of time (i.e. IMS Project [64] has released five versions of its Content Packaging Specification within three years) compared to long lifespans of stable enterprise systems.

Data Retrieval. Metadata and resources are retrieved using data transfer different protocols based on content types and design constraints. Structural metadata can be modeled in an XML schema language and mapped to programming language structures for further processing in memory. This method is called XML-data binding [51, 103, 18], and prefered for the simplicity in design and implementation because metadata definitions are no different than programming languages object models as previously mentioned in Section 2.3.3.

Document-centric metadata models, i.e. RDF and DAML, are also commonly used due to the popularity of RDF and RDF-based metadata systems, and many attempts are made for RDF parser implementations [96].

Content retrieval services are basic document accessing and processing services where, if requested, resources are fetched, cached, and filtered.

Data Stores. A metadata repository may or may not be integraded with the content that metadata defines. The same architecture could serve for both systems since metadata is also treated as data. Data stores can be local, centralized and shared by many repositories, as well as distributed over networks.

In the following section, we describe current research projects on metadata management and repository systems, and their design choices for metadata models, naming and discovery services, authoring and rendering metadata, and data stores and queries.

  Related Work
  Metadata Models
  Naming and Discovery Services
  Authoring, Generation, and Rendering of Metadata
  Metadata Persistence
  Metadata Queries