Johnson, Brad --- "Prophecy with Numbers: Prospective Punishment for Predictable Human Behaviour?" [2005] UTSLawRw 6; (2005) 7 University of Technology Sydney Law Review 117

Prophecy with Numbers: Prospective Punishment for Predictable Human Behaviour?

Brad Johnson*

A discussion about the way in which people reason about events in the physical world will prove to be beneficial before considering the provisions of the Dangerous Prisoners (Sexual Offenders) Act 2003 and is necessary in order to identify and deconstruct the process by which psychiatrists evaluate dangerousness. In the physical world it is both possible and useful to identify relationships or connections between events where one event or constellation of events immediately precedes the occurrence of another. In such circumstances two possibilities may arise:

(i) it may be observed that the presence of one event or constellation of events always precedes the occurrence of another; or
(ii) it may be observed that the presence of one event or constellation of events sometimes precedes the occurrence of another.

The difference between the two possibilities which lies in the use of the words “always” and “sometimes”[1] can be illustrated by considering contrasting examples. If the element potassium is combined with water a violent reaction will result, producing potassium hydroxide gas which combusts spontaneously at room temperature. A chemist would not expect the reaction to produce any exceptions, but would expect them to obey a well observed rule that the combination of potassium and water is always followed by the production of potassium hydroxide gas. Such an expectation reveals an underlying and unprovable belief that future observations will conform to past observations so that certainty in the past will endure in the future. By contrast some associations between events admit random exceptions with the consequence that the past cannot serve as a perfect guide to the future. Habitual smoking is an event, repetitive in nature, which sometimes precedes the development of lung cancer. Some, rather than all, habitual smokers develop lung cancer and as a result for a given population of smokers it is impossible to distinguish between those who will develop the disease and those who will not.

Expectations or Uncertainty

Our attitude towards, and beliefs about, future events in the physical world depend upon our memory of past experiences. Without any past experiences or memory of those experiences it would be difficult to develop any expectations about events in the physical world. For example, in the absence of any past experiences it would be difficult to predict that the setting sun will return; that the incoming high tide will recede or that the snow of winter will melt in the summer. For the observer without any past experiences, the physical world offers many objects never before seen in the context of a dynamic environment of change. Amongst the many things that can be observed, the bright object of the sun changes its position with the passage of time and continues to do so until it sets behind the horizon. For someone who has never before seen a sunrise there is no past experience or memory of a past experience that would suggest that the setting sun will ever return. If it returns, then the observer will recognise the object based on memory of theexperiences from the previous day and notice that it behaves in a similar manner as it sets again. After observing two sunsets a pattern may not be obvious. How many sunsets and sunrises however would a person need to observe before developing the expectation that the sun will always rise and set? Although only an arbitrary answer can be offered to such a question, most people believe that the sun will rise and set tomorrow given their experiences and the experiences of countless generations of past observers who have watched the sun behave without exception. The expectations of people with respect to the rising and setting sun may be contrasted with the uncertainty of the weather. Not many people would expect rain every day or every second day or third day since the number of days between rain and sunshine according to our experiences is not periodic, but variable and therefore uncertain. Given the presence of something that always occurs, it is common for people to develop expectations. The presence of something that sometimes occurs will usually give rise to uncertainty.

Which category does human behaviour belong to? Unlike chemical elements and compounds such as potassium and water, human beings display comparatively complex behaviour and interactions between each other and with their environment. Chemical compounds don’t parent children, enter into romantic relationships, go to war, struggle to improve their standard of living, laugh politely at an attempted joke or commit crimes against other chemical compounds. Humans, who appear to posses some freedom of will or volition, are complex, as are their relationships and behaviour with respect to others. Whether or not human behaviour can be accurately predicted depends upon the existence of patterns that always occur as opposed to patterns that sometimes occur. In the context of the Dangerous Prisoners Act it is necessary to recognise that although people exercising their free will can behave in many ways, some behaviour has been labelled criminal and as a result it would be useful to know if it is possible to predict whether a particular individual will commit a crime. If you can identify individual offenders before they have opportunities to execute their crimes then actual victims can be replaced with potential victims who need not learn to live with the trauma of violation.

For reasons that will become apparent, the main obstacle to achieving accurate predictions with respect to human behaviour is the presence of uncertainty. The response to this uncertainty has been systematic since the gradual development and advent of science through which we have sought to replace “sometimes” with “always” in a quest to eliminate uncertainty and establish order with respect to the behaviour of objects and forces in the universe. Some investigations have witnessed spectacular success, whilst others have left our desires unfulfilled. Where uncertainty has not been eliminated, the methods of statistics and probability theory have emerged as tools for understanding and minimising the risks often associated with it. Essential to these methods—as they are applied to most types of human behaviour—is the process of comparing relative frequencies and the identification of sample populations each based on a specific group of attributes. Here the term “population” can refer to any collection of objects or events, not just people. Either way it is a simple concept that can be understood easily without a background in mathematics.

The goal of such analysis is to predict what outcomes will follow given the presence of certain attributes or behaviours in a population of individuals who all posses the same attributes or display the same behaviour. Smoking and lung cancer offer an illustration. In this case the sample population being studied comprises individuals who display the same behaviour: habitual smoking. The outcome observed is whether or not the individual develops lung cancer. The sample population is divided into two groups, those who develop lung cancer and those who don’t. Two relative frequencies can then be calculated:

(i) the ratio of the number of individuals who develop lung cancer to the total number of individuals in the sample population.
(ii) the ratio of the number of individuals who do not develop lung cancer to the total number of individuals in the sample population.

Example: a given population of habitual smokers might include 1,000 individuals. If 800 develop lung cancer and 200 do not then the following relative frequencies will be obtained:

i) lung cancer present: 800/1,000 or 80%
ii) lung cancer absent: 200/1,000 or 20%

Relative frequencies for certain outcomes where the future conforms to the past present quite a different picture. For example, the sample population might consist of an experiment repeated 1,000 times in which potassium is combined with water. The outcome of interest is whether or not potassium hydroxide is produced. Again, the sample population can be divided into experiments that produce potassium hydroxide and experiments that do not. We should expect to see the following relative frequencies:

Potassium hydroxide present: 1,000/1,000 or 100%
Potassium hydroxide absent: 0/1,000 or 0%.

It can be seen that the data from the sample populations reveal that habitual smoking is only sometimes associated with lung cancer rather than always, whilst the presence of potassium hydroxide is always associated with an experiment in which water and potassium are combined. The distinction between sometimes and always in this case means the difference between accurate predictions and inaccurate predictions. Before an experiment is conducted we should expect to be correct if we predict that the outcome will produce potassium hydroxide. If we predict that a given habitual smoker will develop lung cancer however, then it is possible, given the relative frequencies of the sample population, that we will be surprised by contradiction. In this case the relative frequencies of the sample population also specify the margin of error for any prediction inferred from them. There is an 80% chance that our prediction will be correct and a 20% chance that our prediction will be incorrect for any given habitual smoker predicted to develop lung cancer.

The relative frequencies for a given sample population with respect to a particular outcome can be further analysed by comparing them to the relative frequencies of a second population, defined according to different attributes. For instance the relationship between lung cancer and smoking can be further analysed by comparing the relative frequencies for the same outcome of two or more sample populations defined according to different attributes. Typically, the relative frequencies for a sample population of smokers are compared to the relative frequencies for a sample population of non-smokers.

Example: for a sample population of 1,000 smokers where 800 develop lung cancer and a sample population of 1,000 non-smokers where 100 develop lung cancer the following relative frequencies can be determined and compared:

i) smokers with lung cancer: 800/1,000 or 80%
ii) non-smokers with lung cancer: 100/1,000 or 10%.

The comparison of the sample populations reveals that the chance of developing lung cancer is greater for smokers than non-smokers. In neither case can accurate predictions be made from the relative frequencies of either sample population, however, it is possible to infer that there is an increased risk of developing lung cancer for smokers. As a result predictions that a smoker will develop lung cancer should be more accurate than predictions that a non-smoker will develop lung cancer.

In all of the above examples the relevant data satisfy some of the basic postulates of the probability calculus and demonstrate that there is an intimate relationship between relative frequencies and probability theory. The basic postulates include the following:

(i) the probability of an event must be a rational number that is greater than or equal to 0 and less than or equal to 1,
(ii) the complement probability of an event, that is the probability that it will not occur, can be represented as follows P(!E) = 1 - P(E) where P(!E) represents the probability that the event will not occur and P(E) represents the probability that it will occur,
(iii) the probability of an event that is certain to occur is equal to 1 whilst the probability of an event that is certain not to occur is equal to 0.

For all other events it follows that the likelihood of the event occurring increases as the value approaches 1, and decreases as it approaches 0. Having considered some of the basic principles employed in analysing uncertain events in the physical world, it is necessary to examine their application to the specific example of predicting the occurrence of future sexual offences for convicted sexual offenders as required by the Dangerous Prisoners (Sexual Offenders) Act 2003.

Cogent Evidence to a High Degree of Probability

Before deciding whether to make an order to continue the detention of a convicted sexual offender, the court may only make the decision if it is satisfied by acceptable cogent evidence—and to a high degree of probability—that the evidence is of sufficient weight to support the decision[2], and that the submitted evidence demonstrates that the offender represents a serious danger to the community. In this case the nature of the evidence and its reliability require careful consideration. The evidence that is of primary importance is that offered by the court-appointed psychiatrists who may be commissioned to prepare a report for the purposes of executing a risk assessment order.[3] The risk assessment order requires the preparation of reports by two psychiatrists with each report indicating the level of risk that the prisoner will commit another serious sexual offence if released and the reasons for the psychiatrist’s assessment. [4]

What is risk, and how do psychiatrists determine a convicted sexual offender’s level of risk? It is important for the court to consider these questions and evaluate the experts’ psychiatric evidence critically in order to assess the reliability of their research methods and conclusions. In reaching a conclusion about an offender’s chance of re-offending, the court should be supplied with the facts and assumptions upon which the expert opinion is based, the process of inference which reveals the relationship between the facts or assumptions and the conclusions reached by the expert so that the court can scrutinise the conclusions and their reliability.[5]

Risk can not be measured in the same manner as physical dimensions such as length, mass and time. For each physical dimension there exists an experimentally defined standard unit to measure it. For length it’s the metre, for mass it’s the kilogram and for time it’s the second. For risk however, no standard unit has been experimentally defined. Rather than being measured experimentally with standard units, risk is typically associated with the concept of probability, so that the risk of something occurring increases as its relative frequency approaches 1 and decreases as it approaches 0. Risk, however, need not be defined exclusively in terms of a rational number between 1 and 0, but may also be defined qualitatively in terms of risk levels such as low, medium or high. For either method the data or facts that influence the risk determination are critical, as are the limitations that affect the process of making inferences from the data.

Psychiatrists and psychologists have employed a number of methods for determining risk with respect to human behaviour, which include the following: clinical assessment, actuarial risk assessment and actuarially informed clinical assessment which combines elements from each. The difference between clinical and actuarial assessment is reflected in the type of data relied on in order to determine the level of risk—clinical assessment relying primarily on data about the person being assessed and actuarial assessment relying on data from a population of individuals who share a number of attributes in common with the person being assessed, thus allowing statistical comparative judgements.

Clinical Assessment

In a clinical assessment, information about the client or patient is collected through an interview process during the course of which the client will be confronted with a series of questions, his or her responses and demeanour being carefully scrutinised by the psychiatrists. In addition to an interview, the client may be required to complete a self report questionnaire and the psychiatrist may also interview individuals who may offer observations about the client that can be compared to the responses of the client. The psychiatrist must carefully consider the information from the interviews and questionnaires before forming a clinical judgement about the risk of the client re-offending, the client’s state of mind or a disorder diagnosis.

Some general reservations that should be identified with respect to clinical research methodology in psychiatry and psychology include the following:

(i) presence of hypothetical constructs
(ii) unreliable diagnostic methods
(iii) absence of experimentally defined standard units of measurement

Hypothetical Constructs

Scientific research involves the collection of data that correspond to observations or measurements of perceivable entities in the physical world, and the forces that may influence their behaviour.[6] A hypothetical construct however is an idea rather than an object or entity that can be perceived as sensory information, for which data that correspond to an observation or measurement can not be collected. Human emotions offer a useful example of a hypothetical construct. Many people believe that they experience emotional states despite the absence of empirical evidence to support their existence. For example, my wife is a perceivable object in the physical world, and I believe that I love my wife. I can see my wife, a physical object, and my wife is the object of my love, however, I can not see the love that I believe I feel for her. Love can not be seen, touched, measured or weighed as it does not possess any of the properties of matter. Research methods in psychology have embraced emotional states such as love, depression, anxiety, happiness and many others. Disorders such as those defined in the DSM IV[7] are themselves hypothetical constructs. In an attempt to avoid the absence of perceivable data about such constructs, some research methodologists have assumed the existence of a connection between human behaviour and human emotions. It is believed that certain emotional states to some degree correspond to a constellation of observable behaviours. Anger, an emotional state, might be thought to correspond to the following behaviours: raised voice or shouting, physical assault, threatening gestures etc. Researchers have also attempted to identify associations between chemicals and emotional states or disorders. In addition to observable behaviour that is thought to correspond to particular emotional states the client who offers an introspective opinion about his or her emotions during the interview process. This however requires the co-operation of the client being interviewed, as well as honest responses and an ability to describe emotional states accurately, since the person being interviewed may lie about emotional states or be mistaken. The psychiatrist or psychologist is dependant upon the honest and accurate testimony of the client being interviewed and the honest and accurate testimony of other individuals who have observed and interacted with the client. Dishonest or inaccurate information will ultimately affect the reliability of a diagnosis or assessment of risk.

Diagnostic Methods

Information collected from a clinical assessment may be used to diagnose the interviewed client with a recognised disorder as well as for determining the risk of self harm or harm to others. If disorders, however, are hypothetical constructs defined in terms of certain behaviours and emotional states then a reliable diagnosis is dependant upon a number of factors.

Before considering diagnostic methods in psychiatry it will be helpful to briefly discuss an example of a well defined condition in medicine that corresponds to an observable or measurable physiological condition in the human body, in order to make comparative judgements. To diagnose whether or not a patient has a simple bone fracture, a radiologist can perform an x-ray on the suspect bone, in which case a visual inspection of the x-ray will reveal whether the bone is fractured or not. In this case there is a direct relationship between an observable physiological state in the body and the condition that is diagnosed. Different doctors who examine the same x-ray of a simple bone fracture are unlikely to reach different diagnoses.

By contrast, disorders in psychiatry and psychology are operationally defined. Operational definitions attempt to define disorders which are unobservable hypothetical constructs in terms of observable behaviours and the self report testimony of the client. As a result, psychiatrists do not directly observe the disorder but rather the behaviour and subjective mental states of the client that are believed to indicate the presence of the disorder. This is somewhat like diagnosing a simple bone fracture based on certain patient symptoms without taking an x-ray of the bone. Consider the diagnostic criteria for the DSM-IV-TR[8] defined disorder, “paranoid personality”:

A. A pervasive distrust and suspiciousness of others such that their motives are interpreted as malevolent, beginning by early adulthood and present in a variety of contexts, as indicated by four or more of the following:

(1) suspects, without sufficient basis, that others are exploiting, harming, or deceiving him or her
(2) is preoccupied with unjustified doubts about the loyalty or trustworthiness of friends or associates
(3) is reluctant to confide in others because of unwarranted fear that the information will be used maliciously against him or her
(4) reads hidden or demeaning or threatening meanings into benign remarks or events
(5) persistently bears grudges, i.e. is unforgiving of insults, injuries or slights
(6) perceives attacks on his or her character or reputation that are not apparent to others and is quick to react angrily or to counter-attack
(7) has recurrent suspicions, without justification, regarding fidelity of spouse or sexual partner.

The criterion serves as an operational definition for the disorder “paranoid personality” and guides the interview process by classifying a constellation of behaviours and mental states into a specific disorder which can hopefully be distinguished from other disorders that are also operationally defined. As the interview process proceeds, the psychiatrist attempts to find a correspondence between the defined criterion and the information collected about the client, so that a diagnosis can be reached. The criterion that operationally defines paranoid personality disorder however can not be directly observed. For example, suspiciousness is an attribute or trait of a person that can only be inferred from someone’s behaviour and communication over a period of time. Information about the client must be collected and interpreted in order to determine which disorder their behaviour and mental state most closely corresponds to. This process however is imperfect as the different disorders do not enjoy discrete boundaries but rather overlap in terms of the criterion which defines them. As a result it is possible that two psychiatrists who examine the same patient will arrive at a different diagnosis despite the application of the same diagnostic criteria. In addition, mental states such as suspiciousness are not well defined, so different psychiatrists may examine the same information about a client and arrive at different conclusions with respect to whether the clients behaviour reveals an overly suspicious state of mind. Finally, incomplete information about the client will ultimately affect the reliability of the diagnosis. You must have enough information that allows you to identify the presence or absence of a disorder. Since the failure of the psychiatrist to obtain relevant information or to consider relevant information can affect the outcome of the interview, there is a much greater emphasis on the interviewing skills of the psychiatrist or psychologist and their experience with clinical assessments. Despite these limitations in clinical research methods, psychiatrists and psychologists attempt to diagnose mental disorders in a climate that lends itself to conflicting conclusions. Given that the presence or absence of a mental diagnosis can affect the outcome of a clinical risk assessment, unreliable methods of diagnosis are unacceptable.

Standard Units of Measurement

There are no standard units that have been experimentally defined in psychiatry or psychology. In order to understand the significance of standard units consider the following question: how long is a metre? Many people would find such a question confusing, offering the answer “100cm”. This however does not answer the original question but rather a second question: “How many centimetres are there in a metre?” In physics the question is resolved by identifying an experimentally defined standard unit of measurement. The metre is a standard unit of measurement and the experiment that defines its length is the distance travelled by light in a vacuum during 1/299,792,458 of a second. Other units include the second, kilogram and coulomb each of which, including the metre, attempts to measure one of the four fundamental dimensions in physics: length, mass, time and electrical charge. By contrast, experimentally defined standard units of measurement are noticeably absent and infrequently used in either psychology or psychiatry, a state of affairs that can best be explained by the presence of hypothetical constructs. Given that hypothetical constructs are concepts rather than perceivable objects in the physical world, it follows that it is not possible to measure them directly, and without a system of measurement it is not possible to make comparative judgements. For example, it is not possible to measure someone’s level of clinical depression and compare it to another person’s level of depression, but only to make a judgement about its presence or absence. In contrast, a person’s height is a dimension that can be measured in centimetres. It is possible to measure the heights of two different people in order to determine whether one is taller than the other and by how much. Of the standard units that currently exist, as maintained by the International Standards Organisation, it is difficult to see what assistance any would be in a clinical setting. For example, a psychiatrist could measure a client’s height in centimetres, but it seems unlikely that there would be any relationship between a client’s height and risk of re-offending or mental health.

Clinical risk assessment suffers from the same defects that affect clinical diagnosis of mental disorders. They are often intimately related. The question that needs to be considered is whether or not the presence of a diagnosed mental disorder will increase the risk of a person committing an offence. Alternatively, how is risk to be assessed if no disorder is diagnosed? Although it is beyond the scope of the present article to address these issues, statistical studies and their use as criteria in actuarial instruments suggest that certain disorders, which are not well defined and are difficult to diagnose with reliability, are associated with an increase in incidence of certain offences. Where the person being assessed is not diagnosed with any disorder, then clinical assessment may rely on methods that don’t require expert training. The main issue however, with respect to clinical assessment methods, is the issue of reliability. Reliability which should not be confused with accuracy, concerns consistency with respect to clinical diagnosis or risk assessment as conducted by independent psychiatrists. Reliable diagnostic criteria of assessment procedures should produce similar or identical assessments for independent observers. However different psychiatrists frequently arrive at different conclusions with respect to diagnosis or risk.

Actuarial Risk Assessment

Actuarial risk assessment departs from clinical assessment methods by examining populations of released offenders in order to identify attributes that are associated with an increased risk of recidivism. The data with respect to recidivism rates collected from multiple sample populations of released offenders can be used to make some simple inferences. The relative frequency of recidivism for a particular sample may be used to make a probability statement about the chance of an individual, who shares the attributes that define the population, committing a future offence. Alternatively the relative frequencies for various populations may be compared to determine which samples display a higher level recidivism, which in turn is believed to indicate a greater risk of recidivism. The process of establishing relative frequencies with respect to recidivism begins by examining an initial population of released offenders for a specific period of time which yields relative frequencies for those who re-offend and those who do not. The sample population being investigated also allows researchers to look for attributes that are associated with recidivism. The initial population can then be analysed by specifying further attributes that break the population down into more clearly defined demographic groups in the hope of identifying greater recidivism rates for specific populations. Example: for a given population of 1,000 released offenders where 600 re-offend within a seven year period, the following relative frequencies would be obtained:

Offence committed: 600/1,000 or 60%
No offence committed: 400/1,000 or 40%

By specifying further attributes it is possible to identify a subsection of the initial population in order to determine whether they display an increased or decreased relative frequency. For example 900 of the released offenders might be male and aged between the ages of 30 and 45. If 630 of the 900 re-offend within a seven year period the following relative frequencies will reveal that the incidence of recidivism is greater where the attributes of gender and age are equal to male and 30–45 respectively. If the relative frequencies remain unchanged it is assumed the attributes have a neutral effect and therefore no association with an increase or decrease in risk.

Offence committed: 630/900 or 70%
No offence committed: 270/900 or 30%

When the data have been collected, a number of sample populations—each of which corresponds to a unique set of attributes—can be used to specify a collection or table of relative frequencies which can then be used as a basis for making probability statements and judgements about levels of risk for a specific individual, by looking for the presence or absence of certain attributes. For example given the above relative frequency of recidivism for released offenders who are male and aged between 30 and 45, researchers might infer that there is a 70 per cent chance that any released offender who is male and aged between 30 and 45 will re-offend within a seven year period or that males aged between 30–45 display a higher risk of re-offending when compared to released offenders generally. The purpose of this simplified example is to identify the process of reasoning upon which inferences are made from relative frequencies.

In practice however although the reasoning process is the same, actuarial studies will specify many more attributes beside gender and age in order to analyse recidivism rates from many different dimensions. Attributes which are used to define sample populations may be divided into those that are static and those that are dynamic. Static attributes are those that do not change with time such as an individual’s gender, date of birth and criminal history, whilst dynamic factors are susceptible to change—such as an individual’s, marital status, employment status or substance addiction. The Violence Risk Appraisal Guide (1993)[9] offers the following example of static and dynamic attributes used to assess an individual’s chance of re-offending with respect to a violent offence:

(1) PCL-SV score. Indicates the presence or absence of psychopathy.
(2) Maladjustment at elementary school age.
(3) Diagnosis of personality disorder under DSM IV.
(4) Age at index of offence.
(5) Lived with both parents to age of 16.
(6) Failure on prior conditional release.
(7) Non-violent offence score.
(8) Marital status.
(9) Diagnosis of schizophrenia under DSM IV.
(10) Victim injury.
(11) History of alcohol misuse.
(12) Female victim.

For any individual it is possible to determine the presence or absence of the variables and or assign a value. The risk of an individual re-offending by committing a violent offence depends upon the number of variables to which there is a positive response with the risk increasing as more variables are found to be applicable to an individual being assessed. The psychiatrist assessing an individual will record a response to each of the 12 attributes and compare the result to the relative frequency for a sample population that has the same attributes as the assessed individual. For example person A, with the following results: high PCL-SV score, displayed maladjustment at elementary school, diagnosed with a DSM-IV personality disorder, aged 28, lived with both parents to age of 16, failed on a prior conditional release, low non-violent offence score, single, absence of a diagnosis of schizophrenia, a history of alcohol misuse and female victims. Person A might be at a higher risk of re-offending than an individual, B, with the following results: low PCL-SV score, no incidence of maladjustment at elementary school, absence of a DSM-IV personality disorder, aged 62, no failure on prior conditional release, high non-violent offence score, married, no history of alcohol misuse and no female victim. Data collected by the researchers would presumably reveal that the relative frequency or incidence of recidivism is lower for the population of individuals who display the attributes that correspond to individual B than for the population of individuals who display the attributes that correspond to individual A.

Limits of Statistical Inference

Categorising individuals, according to either static or dynamic attributes which correspond to sample populations that define relative frequencies, presents a number of issues. It is necessary to recognise that the data collected from research that corresponds to sample populations ultimately supports inferences about groups of individuals rather than single individuals. This distinction can be illustrated by considering the following propositions:

(i) Smoking causes lung cancer.
(ii) John will contract lung cancer.

The first proposition is based on a population of individuals, rather than a single individual, from which it is possible to make a generalisation that is subject to exceptions. Of the lung cancer patients surveyed in numerous statistical studies many will be habitual smokers, however not all since lung cancer can occur in either the presence or absence of habitual smoking. A relative frequency of 80% with respect to the incidence of lung cancer amongst smokers indicates that of 1,000 smokers who participated in a study, 800 developed lung cancer whilst 200 did not. It is possible to infer that any group of smokers not surveyed can be divided into two further groups, those who will develop lung cancer and those who will not, in proportions that are consistent with the surveyed sample population. Thus if the future conforms to the past then for a population of 100 non-surveyed smokers it is possible to predict, based on the statistical study, that 80 or approximately 80 smokers will develop lung cancer and 20 will not. What of inferences however with respect to a single individual rather than a group? Clearly it is not possible to divide a single individual into two categories, those with and those without, in the proportions consistent with the surveyed sample. Although for a single individual there are two possible outcomes only one will actually occur. When applied to an individual the relative frequency from a sample population serves as a probability statement that attempts to indicate which outcome is more likely to actually occur. Where each possible outcome has a rational number assigned to it, the outcome whose rational number is closest to one is more likely to occur, which in this case is lung cancer given that its value is 0.80 whilst the possibility of not contracting lung cancer is 0.20. Clearly any prediction, such as proposition (ii), about an individual smoker contracting lung cancer based on the statistical studies could be wrong. This is the impact of uncertainty.

The distinction between statements about groups and individuals with respect to relative frequencies, defined according to statistical studies, leads to a fundamental problem. For any given individual it is not possible to determine the category to which he or she will belong. It is not possible to predict whether one will belong to the group of 80 with lung cancer or the group of 20 without. The same problem clearly applies to recidivism predictions based on relative frequencies from statistical studies where it is not possible to determine for example whether a released offender with a specific collection of attributes will be one of the 70 per cent who re-offend or the 30 per cent who do not. The prediction of recidivism for a released offender like the predicted condition for a given smoker must ultimately be compared to the observed condition or behaviour in order to affirm or refute it. When a prediction is compared to the observed outcome it will fall into one of four categories: (i) True Positive, (ii) True Negative, (iii) False Positive, (iv) False Negative. In the case of recidivism a true positive is an accurate prediction that an offender will re-offend and a true negative is an accurate prediction that an offender will not re-offend. A false positive is an inaccurate prediction that an offender will re-offend and a false negative is an inaccurate prediction that an offender will not re-offend. The presence of either false positives or false negatives indicates that the particular phenomenon can not be predicted accurately which is the case for recidivism whether or not clinical or actuarial methods are employed.

In an attempt to avoid some of the problems associated with quantitative statistical statements about risk based on relative frequencies, some researchers have adopted a qualitative approach to assessing risk. Whilst a rational number between 1 and 0 is assigned to risk using a quantitative approach, a qualitative approach relies on words rather than numbers by defining categories of risk such as low, medium and high. Typically such judgements feature in clinical methods; however they may also appear in actuarial methods where different ranges of rational numbers correspond to different risk categories. Neither approach can necessarily offer accurate predictions however a qualitative approach results in a loss of precision and expresses less information. For example if an assessment that indicates there is a high likelihood that the offender will re-offend is compared to an assessment that indicates there is a 0.75 likelihood that the offender will re-offend it can be seen that the quantitative statement attempts to define the risk with more precision just as 0.756 likelihood is more precise than 0.75 likelihood. Whether risk is expressed quantitatively or qualitatively the question to be considered by the court is: what level of risk must the offender be assessed at before the sentence should be extended? Is a risk level of 0.75, indicating that the likelihood of not re-offending is 0.25, a sufficient basis for extending a sentence or is a more accurate risk level required such as 0.95? Alternatively should a sentence be extended where the risk of recidivism is judged to be high?

A second issue arises where historical information about a sample population of individuals is used to make judgements about an individual who shares a relatively small number of common attributes at the expense of ignoring distinguishing attributes that are not shared. This reveals an assumption that future individuals will behave in a manner that is consistent with the behaviour displayed by a sample population of past individuals and that the relative frequency drawn from the past sample population will remain stable over time. Inferences in statistics where individuals are characterised according to attributes that are consistent with the attributes of a sample population in statistical studies are based upon arguments of analogy. In such arguments the behaviour of something that is known may be used as a basis to make inferences about the unknown behaviour of something that is similar in certain respects. Consider the following example offered by Thomas Reid:

We may observe a very great similitude between this earth which we inhabit, and the other planets, Saturn, Jupiter, Mars, Venus and Mercury. They all revolve around the sun, as the earth does….They borrow all their light from the sun, as the earth does. Several of them are known to revolve around their axis like the earth, and by that means must have a like succession of day and night. Some of them have moons that serve to give light in the absence of the sun, as our moon does to us. They are all in their motions, subject to the same law of gravitations, as the earth is. From all this similitude, it is not unreasonable to think, that these planets may, like our earth, be the habitation of various orders of living creatures. [10]

Reid’s analogical argument attempts to make a conclusion about something that is unknown, the presence of life on other planets, based on the presence of life on earth which bears some resemblance to the other solar system planets. In the same manner psychiatrists attempt to make predictions about the unknown future behaviour of released offenders based on the known behaviour of offenders released in the past and their comparative similarities. In these types of analogies however the differences may be as important as the similarities. Although there were obvious similarities between the earth and the other observable solar system planets during Reid’s lifetime, many differences have since been discovered which would suggest that the presence of carbon based life forms, such as those found on earth, is unlikely. In the same manner the relative frequencies for risk of recidivism are based on sample populations that exemplify a discrete number of common attributes. An individual with the same attributes will be assumed to represent the level of risk defined by the relative frequency of the sample population despite the fact that the individual being assessed will also possess a number of attributes that are different. These differences however which may mitigate the level of risk will be overlooked in a purely actuarial approach to risk assessment.

To compensate for the lack of emphasis on unique individual characteristics in a purely actuarial approach some psychiatrists and psychologists have adopted an actuarially informed clinical assessment which combines the methods of both. This allows the psychiatrist or psychologist to accommodate protective or mitigating factors that might be seen to decrease the likelihood of risk as determined by an actuarial assessment. Some recently developed actuarial instruments such as the HCR-20[11] have been modified to include dynamic information that requires clinical investigation. Whilst this allows important information about the individual to be considered it also introduces the weaknesses of clinical assessment methods which have already been discussed. This has an impact on the reliability of the assessment to the extent that independent psychologists or psychiatrists reach the same diagnosis or assessment of risk. That is to say that independent psychologists and psychiatrists are more likely to reach the same conclusion with respect to risk when applying actuarial methods than when applying clinical methods. As a result reliability decreases in the presence of clinical assessment methods.

Conclusion

From the limitations outlined above it should be apparent that statistical inferences are ultimately based on generalisations about populations that don’t eliminate uncertainty but rather in the presence of uncertainty attempt to indicate which of two options might be seen as more likely, though not certainly, to occur. In addition such generalisations obscure the identity of the individual with the effect that differences may be overlooked. Clinical approaches on the other hand also suffer from defects that render their diagnostic and risk assessment methods unreliable, an issue which also features in actuarially informed clinical assessment. Despite the differences, all three approaches share a common goal of attempting to determine the risk that an offender poses with respect to recidivism upon release rather than attempting to predict what the offender will actually do. Given the limitations of actuarial and clinical methods of assessment, a number of issues requires judicial consideration. Central to this consideration is the following question:

Can clinical or actuarial methods of risk assessment yield cogent evidence, as required by the legislation, which identifies the level of risk for an offender with a high degree of probability?

Furthermore since the concept of risk implicitly accommodates uncertainty, which in this case reflects the inability to make accurate predictions about the future behaviour of released offenders, are inaccurate predictions of future behaviour an acceptable basis for extending the sentence of an offender who is otherwise entitled[12] to release?

Given the limitations outlined above it is arguable that none of the assessment methods can yield evidence which satisfies the standard of proof, a high degree of probability,[13] required by the legislation. It does not follow that such risk assessment evidence is inadmissible for the legislation authorises the court to issue a risk assessment order to be performed by court appointed psychiatrists.[14] However the risk assessment evidence on its own is not sufficient to support a finding of dangerousness which features as only one of many factors to be taken into consideration by the court under section 13(4) of the legislation.[15] As a result the court must decide what probative value or weight should be given to the risk assessment reports conducted by the court appointed psychiatrists. In considering this question the court must critically examine the nature of the research methodology relied upon to assess risk and or make predictions about future human behaviour with respect to serious sexual offences. In doing so it should look closely at the relationship between the inferences drawn by psychiatrists and the facts or assumptions which support them. Given the problems that affect each of the methods for assessing risk and the serious consequences with respect to the outcome of an assessment it is arguable that any such evidence should be assigned a low probative value or weight.

* Part time lecturer in legal philosophy, Southern Cross University.

[1] See Rudolf Carnap: An Introduction to the Philosophy of Science (1995). Carnap makes a similar distinction between universal laws and statistical laws.

[2] See Dangerous Prisoners (Sexual Offenders) Act 2003, ss13(1), (3)

[3] Ibid ss 8(2)(a), 9.

[4] Ibid s11(2).

[5] Makita v Sprowles, Heydon JA, 81.

[6] Research may also accommodate theoretical explanations as well as experimental observations. The observation that the combination of zinc and copper in the presence of an electrolytic solution, such as salt water, will produce a measurable voltage or potential difference, may be distinguished from the theoretical explanation based upon the unequal distribution of electrons between the zinc and copper electrodes. One is observable through measurement, the other currently is not.

[7] The DSM IV (Diagnostic and Statistical Manual of Mental Disorders Version IV) lists the currently accepted disorders and the criteria by which they are diagnosed.

[8] The DSM-IV-TR is the text revision (TR) of the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders of the American Psychiatric Association, published in 2000.

[9] Harris et al, Violence Risk Appraisal Guide (1993). The methodology for designing actuarial instruments is the same whether they attempt to predict violent or serious sexual offences. The attributes selected however may be different depending on the type of criminal behaviour being predicted.

[10] Sir William Hamilton (ed), The Works of Thomas Reid, D.D. (first published 1846, 1983 ed).

[11] HCR-20 (Webster, Douglas, Eaves and Hart, 1997 Version 2)

[12] It follows that the requirement of risk assessment is unnecessary where a serious sexual offender is not entitled to release.

[13] This phrase is not defined in the legislation, however, it is unlikely that courts would accept a definition based on mathematical terms which defines probability as a rational number between 1 and 0. As such, courts are unlikely to identify a rational number between 1 and 0 that corresponds to a high degree or probability. The standard would necessarily be greater than that required for civil trials, beyond the balance of probabilities, and more closely resemble the criminal standard of proof.

[14] See Dangerous Prisoners (Sexual Offenders) Act 2003, ss8(2), 9, 11.

[15] Section 13(4) identifies ten such factors which must be taken into consideration by the court.

University of Technology, Sydney Law Review

Search AustLII

All Databases

Cases & Legislation

Journals & Scholarship

Law Reform

Treaties

Libraries

Communities

LawCite

Australia

CTH

ACT

NSW

NT

QLD

SA

TAS

VIC

WA

New Zealand

Specific Year

Any

Any

Johnson, Brad --- "Prophecy with Numbers: Prospective Punishment for Predictable Human Behaviour?" [2005] UTSLawRw 6; (2005) 7 University of Technology Sydney Law Review 117

Prophecy with Numbers: Prospective Punishment for Predictable Human Behaviour?

Brad Johnson*

Expectations or Uncertainty

Cogent Evidence to a High Degree of Probability

Clinical Assessment

Hypothetical Constructs

Diagnostic Methods

Standard Units of Measurement

Actuarial Risk Assessment

Limits of Statistical Inference

Conclusion

Print

Download

Cited By

All Databases

Cases & Legislation

Journals & Scholarship

Law Reform

Treaties

Libraries

Communities

LawCite

Australia

CTH

ACT

NSW

NT

QLD

SA

TAS

VIC

WA

New Zealand

Specific Year

Any

Any

Johnson, Brad --- "Prophecy with Numbers: Prospective Punishment for Predictable Human Behaviour?" [2005] UTSLawRw 6; (2005) 7 University of Technology Sydney Law Review 117

Prophecy with Numbers: Prospective Punishment for Predictable Human Behaviour?

Brad Johnson*

Expectations or Uncertainty

Cogent Evidence to a High Degree of Probability

Clinical Assessment

Hypothetical Constructs

Diagnostic Methods

Standard Units of Measurement

Actuarial Risk Assessment

Limits of Statistical Inference

Conclusion

Print

Download

Cited By

Join the discussion