Information retrieval in dbms

Nowadays, Most Information retrieval in dbms are a major component. The Relational DBMS is currently by far the most widely accepted DBMS. Database Management Systems supporting a hierarchical or network data model are often seen as obsolete, although many large corporate database systems are still based on the latter two models.

Well-known Database Management Systems like ORACLE, INGRES, INFORMIX and DB2 are originally designed for the management of databases with structured data. Structured data is data which meaning depends on its relative position on a data carrier. This contrasts with less structured data in text, image, video, animation, audio, graphics. (We avoid the term unstructured because it is misleading: texts can be structured into paragraphs and chapters, within a pictorial image certain objects can be distinguished, and also music and speech have their own structures.) In many cases it is not cost-effective (or even possible) to try to structure all data.

Since some years, Database Management Systems are developing in the direction of Multimedia Database Management Systems (MDBMSs) by implementing storage and retrieval of binary large objects. SYBASE is an early example of this. Binary large objects may contain any type of data.

Still, most Database Management Systems fall short in supporting advanced information retrieval facilities like full text indexing, usage of inexact query arguments, usage of a thesaurus, pattern recognition, ranking and clustering, set manipulation, and search profiles (Hoogeveen, Van der Meer & Sol, 1992). Support of these facilities is still the domain of Information storage and Retrieval Systems (IRSs). Examples of IRSs are BRS/Search of BRS Information Technologies, Stairs of IBM, and TOPIC of Verity.

 

We distinguish mainly four types of IRSs:

Reference Systems, e.g. automated catalogues, are used to keep track of external sources of information, whether digitised or analogue. These information systems may contain references to books, documents, articles, audio and video tapes, etc.

Document Image Systems or Optical Filing Systems are Information storage and Retrieval Systems in which every page of a document is scanned and stored as a bitmap in binary files. Document pages can be retrieved by keyword searches.

Full Text Systems, in which the complete text of documents is indexed and can be searched.

Multimedia Document Systems, which manage multimedia documents (Bos & Van Wijk, 1993).

Document Image Systems and Optical Filing Systems sometimes offer hypertext facilities. Multimedia Document Systems may offer hypermedia facilities. The ‘hyper’ prefix indicates that non-sequential access to trunks of information by tracing hyperlinks within a document is supported (Frei & Schäuble, 1991). Within this context, we will not handle Hypertext/hypermedia Systems – specialised in non-sequential access – as a separate non-IRS class, although this can be disputed.

It is important to notice that most Information storage and Retrieval Systems are lacking the power of DBMSs in handling structured data.

 

Information workers benefit from the integrated use of multiple data types. The integrated use of numerical, textual, graphical, audio and video data proves its value, e.g. in public multimedia information services (Hoogeveen & Andersson, 1993). This is also the case for policemen, like criminal analysts and detectives of the Dutch police, who need a more complete support in their work with text documents, photographs, fingerprints, wiretapped conversations, graphics, video tapes, etc.

The popular term multimedia is used here to indicate that multiple data types are processed coherently. The joint ISO/IEC (1992) working group Multimedia and Hypermedia information coding Expert Group (MHEG) speaks here of the property to handle several representation media. A representation medium is equated with a type of data.

A more complete support of multimedia information work includes a natural use of multiple data types and a quick and simple access to information whatever its format. Thus, in situations in which multiple data types are processed coherently (often multimedia work situations) the need for both Database Management System and Information storage and Retrieval System facilities is obvious.