The database market is facing a radical shift in emphasis with the introduction of XML and the requirement to maintain persistent XML content. XML is becoming all-pervasive, and although much of the early work with XML focused on the transactional nature and possibilities of XML (the idea that XML would become the EDI-killer), there is now recognition that the possibilities go much further than originally imagined.
The amount of XML content that needs to be persisted is growing and systems need to be put in place that can handle the documents at a higher level of granularity than is provided by relational systems. Typically the storage of persistent XML documents has fallen within the domain of content management systems, but the market is extending to take into account a different level of granularity that can be obtained from a single XML document.
Databases are collections of data, and the distinction between the individual data elements and the collection and storage facilities for multiple data elements is an important one. Data elements are created in one of two forms, and their use or purpose also takes one of two forms.
Data elements can be created as a single entity that exists by and of itself; they can also be created by decomposing documents that contain individual data elements. Similarly their purpose can be as a single entity or as a collection of inter-related elements that form an informational document.
Data elements can be re-ordered and re-purposed to provide different information. The data elements that exist within, for example, an XML document can have a life and a purpose outside of that specific document. This purpose can only be realised if the physical implementation that is used to store the documents allows it to take place, or does not constrain that extended use.
That is, the implemented infrastructure has to recognise a point beyond simply having the ability to store and retrieve a document. It also has to allow manipulation of the individual data elements. This is where two technologies coincide. The relational database model is strongly focused on the collection and storage of individual data elements, and forming a logical relationship based on the original model for the database. XML structure and by extension native-XML databases are modelled by the documents that they store. The relational model has an implemented schema, while the XML model has multiple schemas.
The true business requirement is to remove the distinction between the two; to create an infrastructure that handles both relational and hierarchical structures. Furthermore, there has to be a connection between the two types. It is not as simple as having relational models and hierarchical models; there has to be different models from a technical point of view due to the physical limitations of the internal systems (it is inherently difficult to implement a hierarchical structure in a relational model, and maintain consistency). These physical limitations must not be allowed to extend to the use of the data. Data is not a single purpose entity, it can have multiple purposes, but only if this fact is recognised and the cross-database barriers are broken down.
This Report covers three main areas:
Further analysis is also included on the markets to help the decision-making process (Section 5).
The Report is biased towards the use of the relational model for two main reasons:
The omission of the Object-Oriented (OO) model is a clear indication that, at the enterprise level, the ability to store persisted XML documents in either native-XML databases or in extended RDBMSs, has overtaken the OO model. The class structures as defined in the OO model can be handled within XML structures (there are multiple cross-over points between the two), and the only need for OODBMSs is to form a technology element for storing XML structures. The actual workings of the OODBMS model are discussed in Section 3.4.3.
In many respects, the OO model has been abstracted to a higher level, where it more naturally resides. The indication of how this is implemented appears in Section 4.1.3, which considers both metadata management in general, and also looks at some of the specifics of the Meta-Object Facility (MOF) from the Object Management Group (OMG).
This Section provides a key indication as to the way that data and information has to be managed in a more integrated environment; showing the distinctions that have to exist between database management and Information Asset Management (IAM).
Section 4.1.1 of this Report also discusses IAM in terms of how the dichotomy between data management and database management can be resolved. Although data is physically tied to the implemented database system (and to a certain extent limited by that implementation), it should not also be conceptually tied and limited by the implementation.
The part that content management has to play is also part of the area of IAM, and although it is not a specific element within this Report, it is worth mentioning how it forms part of the jigsaw puzzle. The basic choices facing businesses in the area of IAM are what implementations are needed at the database level (to use the term as strictly defined as a collection of data). The four choices are:
All four have specific purposes, and therefore all four are relevant to specific business use and/or purpose. Relational databases, and at this point it should be noted that any relational implementation for a non-embedded database should have a large degree of XML capability, should be implemented where there is a strong and ongoing relationship between data elements. The relational model is designed to represent parts of the real world, and for this purpose it is still the best choice.
Native-XML databases should be implemented where the business requirement is to store specific business documents that have a well-formed XML structure for search and retrieval purposes. That is, where the need is for persisted XML documents.
Content management can be used to store structured or non-structured content that does not fit within either of the other two models. Typically, new media-type storage fits into this category.
Data warehouses and data marts fit best into the strategic BI arena, and are more the concern of purpose rather than structure.
Even though the different physical implementations can be clearly defined (with some cross-over as to the choice between relational and native-XML) this should not extend to the use of data, information, or content. This is a key message, and one that organisations have to come to terms with. The ability to re-purpose data will form the competitive differentiator for all businesses within the next 20 years.
From a technical standpoint, Section 3 covers specific areas of technology to a greater or lesser degree of depth. Although many of the features that are contained within DBMSs are reflected in all the major products, there is a degree of diversity.
Section 3.1.1 looks in some depth at the issue of database replication. The reason why this has been covered so extensively is the impact it has on data replication and how that can be managed (covered in Section 4.1.5). Again, this demonstrates that, although there is a need to treat the two areas separately, the constraints within any given system may make that division more complex than it need be.
The second most important element of database functionality, back-up and recovery, is also extensively covered. Section 3.1.2 looks at strategies for backing up data (and more importantly, recovering that data) and also the different elements that need to be considered when creating a recovery strategy.
Having mentioned the second most important element, it is now time to consider Section 3.1.4, which covers security. The implementation of security within a database product can be problematic. Security is an enterprise issue, and as such has to extend past any implementation of database or application. This does not absolve the providers from having security aspects within their product offerings.
Although security has to be implemented at the highest level, covering the infrastructure and peripheral elements, this can only be achieved by having open standards within specific offerings that can be brought together into the security framework. Passing control of security to another part of the infrastructure is not the answer.
Database security has to follow the standard format of Authentication, Authorisation, and Auditing. Any failure in any of the three parts is considered a total failure. There is another level of security that needs to be addressed, which is covered in this Section of the Report; this is data encryption. Data encryption is vital in terms of security, but the I/O impact that blanket encryption can have is so enormous as to make having a sensible encryption strategy a prime requisite.
Although database technology is considered mature (especially in the relational field), there are new techniques being introduced. In the area of indexing (Section 3.3.4) a detailed review of a technique known as Adaptive Addressing from a company called CopperEye is included. This new indexing has both an open solution and a specific solution for Oracle, and promises to set new standards in this area.
Overall, this Report can be used to look at both the physical and the logical elements of databases, their uses, their strengths, and their weaknesses. The Report can also be used as a jumping-off point for understanding the need to incorporate database management with data management, and bring the whole data structure under single control.
Der vollständige Bericht kann bei der Butler Group bestellt werden.