Most risk assessment scales are developed and validated by researchers based on file information scored by research assistants, who may have extensive training in the behavioural sciences, but no field experience. An essential question is whether the accuracy of these research studies will be similar to the results of real cases assessed by front-line staff. In other words, how can we implement a scale so that it works with similar (or even better) accuracy as research studies? Before delving into concrete recommendations for conducting high quality assessments, I will first use an illustrative example and summary of research to demonstrate that the commitment of individual staff members and organizations can make a huge difference in how well a risk assessment scale will work.
Illustrative Example: A Tale of Two States
Static-99 (and its revised version, Static-99R) is the most commonly used sex offender risk scale and there are over 60 studies of the scale, which have found that on average, it has moderate predictive accuracy. Unfortunately though, there are very few studies of how it works in field settings.
Among many states to mandate the use of Static-99/R for imprisoned sex offenders are Texas and California. Texas found low predictive accuracy – their results were lower than most studies of the scale. In contrast, California found remarkably high levels of predictive accuracy – their results were among the best of all studies conducted on the scale, or any other risk scale.
How could two American jurisdictions implementing the same risk scale achieve such remarkably different results?
There are many methodological or policy differences that could affect these findings, but at least part of the difference likely has to do with the quality of implementation. The study from Texas provides no information on how the correctional system maintains the quality of their risk assessments. In contrast, California has a remarkably rigorous implementation and quality control system. All staff who use the scale must be trained by someone certified by the Static-99R developers or a ‘super trainer’ who is certified by a certified trainer and has at least two years of scoring experience. All staff receives training from a detailed, standardized curriculum and by law, they must be retrained in the scale every two years. Additionally, they must pass scoring tests after training, and ideally, their first 10-20 cases are reviewed by a super trainer. Novice users are also encouraged to work with a mentor to maintain the quality of their assessments. With such diligent attention paid to the quality of their risk scale implementation, it is not surprising that California has found some of the highest predictive accuracy ever obtained in a field setting for a risk assessment scale.
What Does Research Tell us About Risk Assessment Quality?
In previous research, quality of risk scale implementation, defined either as involvement of the scale’s developer to (hopefully) help ensure fidelity to the scale or whether community supervision officers completed all the steps that were requested of them, was associated with substantial increases in predictive accuracy. Additionally, quality of training has also been linked to quality of risk scores.
What Can Staff and Organizations Do to Promote High Quality Assessments?
The jurisdiction examples and research discussed above show that it is not enough merely to implement a risk assessment scale. Care should be taken to ensure the assessments are done well. This is particularly important given that any risk assessment conducted can be challenged in court.
Below is a list of policy and procedures that are recommended by the developers of the STABLE-2007 (Fernandez, Harris, Hanson, & Sparks, 2014) to keep quality high. This list was modified to be applicable to all risk scales, and is included with permission of the scale authors.
All risk assessment practices should have most, if not all, of the following components in place:
1. A Bring Forward system to cue when it is time to re-assess scorings, ensuring regular scorings (applicable for dynamic risk assessment scales).
2. A system of peer reviews so that everyone is working towards scoring calibration (i.e., all scoring the same case alike). Colleagues should meet on a regular basis and present their scorings to each other and discuss the scorings, working towards consensus.
3. Clinical supervision by a very experienced assessor so that those scoring have access to a resource person for tricky questions (this person may well organize the peer review sessions).
4. Mentorships with those who are more experienced in using the measure so that novice scorers have an identified person with whom they can discuss their cases and their risk scoring.
5. Participation in inter-rater reliability trials where about 10% of the cases are scored by more than one rater and the scores are compared. This technique leads to better calibration of scoring.
6. Your agency may wish to consider participating in webinars about scoring and other risk assessment issues.
7. When scoring risk assessments on any offender a jurisdiction should have a quality control process in place, either through regular professional development days, internal supervision by senior employees who are committed to the risk assessment process or possible “scoring clinics” run cooperatively within organizations.
L. Maaike Helmus, Ph.D., Forensic Assessment Group, Ottawa, ON, Canada