Previous Document

of someone with some expertise, and consequent expense.                 [PP17]
In summary, although the value of free-text retrieval is
contentious, its effectiveness is likely to be dependent upon sensible
tasks, realistic expectations and good software.


Capture of text for free text retrieval


Text is often available for free-text retrieval purposes because it was
created in computerised form. Transcript, witnesses statements and some
documents which are potential exhibits may be so available. Where text is not
so available, paper copies may be converted into computerised text by the use
of a 'scanner', a piece of computer hardware, and optical character
recognition (OCR) software. The scanner takes an image of the document (as if
it was taking a photocopy), and passes that image to the OCR software, which
'reads' the image and creates a text file of the words it recognises in the
image. In recent years OCR software which is capable of a high degree of
accuracy has become available at affordable cost.


'Marking up' text for text retrieval


Depending on the retrieval program to be used, the 'raw' text may need to have
'markers' inserted within it, indicating where particular features of the text
start and stop. For example, there may need to be regular indications in the
text as to where a new document starts, or where the evidence of a new witness
starts, or where a document has been tendered as an exhibit. Marking up can
often be automated to a large extent, particularly when the creation of the
text is done under controlled conditions which allow it to be given a very
uniform structure (as is possible with transcript: see Chapter 4). The
availability of powerful retrieval functions will often depend upon such
marking up.



                                                            Next Document