Previous Document
2.2. Document image retrieval and management
Image database systems store documents as images (or pictures), not as text.
The pictures are stored on disk as bit mapped images, meaning that they are
made up of a series of dots (monochrome, greyscale or colour). Data is captured
by way of an image scanner, which may be the same hardware as is used to
capture text through OCR software. Unlike free text retrieval systems, image
databases store and retrieve these scanned images as images - they are not
first converted to text. This approach offers two main advantages. Firstly, it
is faster and simpler to capture paper based data than by using OCR software,
which tends to run relatively slowly and requires user intervention to correct
mistakes. Secondly, the captured data is a literal copy of the original (that
is, it is similar to a photocopy of the document). Unlike text, images can
include handwritten notes, diagrams and so forth. A more detailed comparison
is contained in Chapter 5.
The principle disadvantage of storing documents as images is that there is no
inherent (or automatically available) search mechanism. This is usually
overcome by superimposing a conventional (manually created) indexing system,
usually by way of a standard flat file database, such as that used for
document control databases. Images also tend to take greater amounts of disk
space than their textual counterparts. This can be partly overcome by using
either software or hardware compression techniques.
Graphics formats
The file formats used to store images or pictures are usually called
'graphic file formats', of which there are at least several hundred.
The reason for this is partly historical, but also reflects the needs
of different hardware and software vendors to maximise the performance
of their products. Some formats are specific to particular types of
hardware (eg. ADEX .img/.rle, Autologic .gm, and Targa .tga formats);
others to generic classes of machines (eg ZSoft PCX
Next Document