Previous Document

5.3. Full text retrieval of exhibits and other documents


There is relatively little use of full text retrieval of documents other than
transcript and witness' statements. Most other documents will still only be
available to the prosecution or defence in paper form, and will require to be
converted to computerised form by optical character recognition (OCR)
equipment before they can be searched with text retrieval software. Full text
retrieval of documents which are potential exhibits is used rarely, for the
following reasons (Livermore (1992)):

o  Documents often contain significant information in handwriting or other
   non-textual forms, which cannot be captured by OCR;
   
o  Some errors are inevitable even with good quality documents being scanned,
   and error-correction is an expensive and time-consuming task, with no
   guarantee of a perfect result.
   
o  To convert a document by OCR takes significantly longer than to scan an
   image of it, and therefore involves higher costs.
   
These reasons indicate why image capture of documents, which does not have
these problems to the same degree, is increasingly regarded as more
appropriate than full text capture, particularly when the numbers of documents
in complex prosecutions are taken into account. However, as Livermore says, it
may be sensible to OCR a very small percentage of the overall documents, the
most important ones, in addition to capturing them as images. The great
advantage that OCR text capture has over image capture, of course, is that it
is possible to search it for occurrences of words.

The exception in relation to witnesses' statements is because they share many
of the same features as transcript. Firstly, they are likely to have been
created on word processing software, thereby eliminating the need to use
optical character recognition (OCR) to convert a paper document to
computerised text. Secondly, they are discursive and not easily summarised in
a document control database.



                                                            Next Document