- Research article
- Open Access
- Open Peer Review
Validation of the PHQ-9 as a screening instrument for depression in diabetes patients in specialized outpatient clinics
© van Steenbergen-Weijenburg et al; licensee BioMed Central Ltd. 2010
- Received: 11 January 2010
- Accepted: 12 August 2010
- Published: 12 August 2010
For the treatment of depression in diabetes patients, it is important that depression is recognized at an early stage. A screening method for depression is the patient health questionnaire (PHQ-9). The aim of this study is to validate the 9-item Patient Health Questionnaire (PHQ-9) as a screening instrument for depression in diabetes patients in outpatient clinics.
197 diabetes patients from outpatient clinics in the Netherlands filled in the PHQ-9. Within 2 weeks they were approached for an interview with the Mini Neuropsychiatric Interview. DSM-IV diagnoses of Major Depressive Disorder (MDD) were the criterion for which the sensitivity, specificity, positive- and negative predictive values and Receiver Operator Curves (ROC) for the PHQ-9 were calculated.
The cut-off point of a summed score of 12 on the PHQ-9 resulted in a sensitivity of 75.7% and a specificity of 80.0%. Predictive values for negative and positive test results were respectively 93.4% and 46.7%. The ROC showed an area under the curve of 0.77.
The PHQ-9 proved to be an efficient and well-received screening instrument for MDD in this sample of diabetes patients in a specialized outpatient clinic. The higher cut-off point of 12 that was needed and somewhat lower sensitivity than had been reported elsewhere may be due to the fact that the patients from a specialized diabetes clinic have more severe pathology and more complications, which could be recognized by the PHQ-9 as depression symptoms, while instead being diabetes symptoms.
- Major Depressive Disorder
- Major Depressive Disorder
- Receiver Operator Curve
- Patient Health Questionnaire
- Screen Instrument
Seven percent of adults in the USA have been diagnosed with Major Depressive Disorder (MDD) and in adults with chronic diseases, such as diabetes, this increases to over eleven percent . Although the causal connection between the two remains unclear, the consequences are far-reaching. Having both diabetes and depression is associated with poor glycaemic control, resulting in more severe complications and a lower quality of life [2, 3]. With the increasing severity of diabetes, the prevalence of depression also increases, and especially in vulnerable patients such as those with diabetes-related complications. Depression has severe consequences; underlining the importance of focus on the prevention of depression.
Depression often remains unrecognized, and although several screening questionnaires are available, unfortunately, most questionnaires have been validated for use in primary care in patients with less complex medical illnesses. It is expected that patients with severe diabetes and depression are frequently present in specialized outpatient clinics or hospitals, but specialists often do not have the necessary time or skills to recognize depression. The recognition of depression is very important and the Patient Health Questionnaire (PHQ-9) was developed for this purpose [4, 5]. This instrument has already been validated for primary care patients, cardiac patients in general hospitals , and diabetes patients in primary care , but not for diabetes patients in specialized outpatient clinics. Besides that, the present study is the first one assessing operating characteristics for the PHQ-9 in diabetes patients. In patients with chronic medical diseases, co-morbid MDD can be difficult to identify, because the symptoms of the two may overlap. The effect of symptom overlap on the performance of screening instruments for depression, such as the PHQ-9 , would be that higher cut-off points are necessary to correctly identify MDD in the chronically ill than in a population with less severe illnesses. The overall effect would be that both sensitivity and specificity would decline.
In this study we assessed the criterion validity, in terms of sensitivity, specificity, positive and negative predictive value. What is new is that Receiver Operator Curves (ROC) were assessed of the PHQ-9 for MDD in diabetes patients in specialized outpatient clinics. These specialized clinics differ from general diabetic care clinics in that in these specialized clinics foremost patients with severe diabetes with complications are present and specialized clinical diabetes care is provided by a team of a diabetologist, a specialized diabetes nurse and a dietician.
Patients and procedures
After approval of the study protocol by the Medical Ethics Committee "Verenigde commissies mensgebonden onderzoek", patients were selected from two specialized outpatient clinics for diabetes in the east of the Netherlands.
After giving informed consent, the patients received the PHQ-9 by mail. Those who gave informed consent had a MIni Neuropsychiatric Interview (MINI) by telephone , within two weeks after filling in the PHQ-9. The interviews were administered by trained interviewers who were not blinded for the PHQ-9 scores. Patients were excluded if we were unable to contact them within two weeks after they had filled in the PHQ-9 or if they did not give informed consent.
The PHQ-9 is a screening questionnaire, developed by Kroenke et al  containing nine questions about the symptoms of MDD. It has the following answer categories: "not at all", "various days", "more than half the days" and "almost every day". Respectively zero, one, two or three points were scored and a summed score of the nine questions was calculated. The questions refer to the situation in the previous two weeks. This questionnaire is based on the Diagnostic and Statistic Manual of mental disorders-IV (DSM-IV) criteria for diagnosing MDD in patients with medical illnesses, and the questions concerning fatigue, concentration, depressive complaints, thoughts of death, etc. The PHQ-9 can also be used to screen patients for MDD specifically, according to the DSMIV criteria. This 'algorithm', developed by Kroenke et al  is positive for MDD when a total of five questions on the PHQ-9 have a score of two or more points, with exception for question nine: scoring at least 1 point is sufficient. Besides that, question one ("in the past two weeks I had less interest and fun in doing activities"), or question two ("in the past two weeks I felt dejected, depressed or desperate"), have to be answered positively.
In this study, the telephone based MINI  was used as the criterion instrument to diagnose MDD. The questions in this interview, which are often used in clinical practice, are based on the DSMIV criteria.
Sensitivity and specificity calculations
There is always a possibility that a patient is falsely screened positive or negative. Therefore, it is important to reduce this possibility by identifying the most optimal combination of sensitivity and specificity. This way, the clinically acceptable risk of a falsely screened patient can be determined.
When a higher criterion value is selected, the false positive fraction will decrease with increased specificity but on the other hand the true positive fraction and sensitivity will decrease, as described by Zweig et al :
"In a Receiver Operating Characteristic (ROC) curve the true positive rate (Sensitivity) is plotted in function of the false positive rate (100-Specificity) for different cut-off points. Each point on the ROC plot represents a sensitivity/specificity pair corresponding to a particular decision threshold. A test with perfect discrimination (no overlap in the two distributions) has a ROC plot that passes through the upper left corner (100% sensitivity, 100% specificity). Therefore the closer the ROC plot is to the upper left corner, the higher the overall accuracy of the test" . The flattening of the curve shows when there is no additional benefit of the screening method.
To answer the research question on the criterion validity of the PHQ-9 a Receiver Operating Characteristic curve (ROC curve) is made with SPSS version 15.0.
Of the 1,278 patients that filled in the PHQ-9, 382 were excluded because they did not give informed consent and another 501 were excluded because they did not return the PHQ-9 within 2 weeks. Of the 395 eligible patients, 198 were unable to be reached within 2 weeks after they had filled in the PHQ-9, so data on 197 participants were finally included in our analyses (49.8% of the eligible patients).
Age (Mean, sd)
Gender (freq, %)
PHQ-9 score (mean, std. error)
Algorithm score (N, %)
MINI (N, %)
Sensitivity, specificity, predictive values and efficiency outcomes for different cut-off scores
N = 99 (50.3%)
N = 95 (48.2%)
N = 91 (46.2%)
N = 71 (36.0%)
N = 60 (30.5%)
PV (pos. t.)
PV (neg. t.)
The requirements for a screener can vary, but for most purposes the lower boundary of the sensitivity of a screener is around the 75%. Table 3 shows that a cut-off point of 12 combines sensitivity > 75% with the optimal specificity (80%). Lower cut-off points by definition improve the sensitivity, with a sensitivity of 91.9% for the summed scores of 8, 9 or 10 on the PHQ-9. This is at the expense of the efficiency of the screening; a lower specificity varying from 59.4% for a cut-off score of 8, to 74.4% for a cut-off score of 11. Predictive values for the test-positive results varied from 34.4% for a cut-off score of 8, to 46.7% for a cut-off score of 12. The predictive values for the test-negative results increased from 96.9% for a cut-off score of 8, to 97.2% for a cut-off score of 10, and then decreased to 93.4% for the cut-off score of 12.
Sensitivity, specificity, predictive values and efficiency outcomes for the algorithm score
PHQ > 10
N = 91
N = 42 (21.3%)
N = 42 (46.2%)
N = 155 (78.7%)
N = 49 (53.9%)
PV (pos. t.)
PV (neg. t.)
Outcomes of AUC for the PHQ-9 summed score versus the MINI
Asymptotic 95% Confidence Interval
PHQ-9 summed score
Findings of the study
In this study, for the first time the PHQ9 is validated as a screening instrument for MDD in diabetes patients visiting a specialized outpatient clinic. As such, it gives us important information about the validity and appropriate cut off scores for identifying Diabetes patients with a high possibility for having MDD.
The main finding of this study is that the PHQ-9 appears to have satisfactory criterion validity as a screening instrument for MDD in diabetes patients in specialized outpatient clinics. We recommend using a cut-off score of 12 to recognize depression in diabetes patients from specialized outpatient clinics. This is a higher cutoff score than is generally used for identification of MDD patients in the primary care setting in patients without advanced medical co-morbidity.
The predictive value in general does not only depend on the quality of the instrument, but varies with the prevalence of the disorder in the study population and with factors that may blur diagnosis and cause misclassification of those who are at risk for the diagnosis. In the present sample, the a priori likelihood of patients having MDD was relatively high (18,8%), while their often complex medical condition might tend to blur the contrast between those with and without depression. The consequence is that one would expect screening instruments for depression to perform less efficiently, and that higher cut-off points are necessary to efficiently eliminate those with depression. This proved to be the case.
The algorithm score, although it literally follows the DSMIV criteria, did not show very good sensitivity or specificity (resp. 63.8% and 63.6%). This is unexpected, because the MINI interview, used in this study to diagnose MDD, also follows the DSMIV criteria.
Limitations of the study
Limitations of this study were first of all that it was not possible to blind the interviewers with regard to the PHQ scores. Although they were not aware of the purpose of the interview, knowing the scores might have influenced the outcomes, and might also have inflated both sensitivity and specificity. Secondly, the response rate was not high, because only patients who returned the PHQ-9 within two weeks after receiving it were included. Unfortunately, no epidemiological data on the non responders could be obtained, as this was confidential information which the hospitals were not allowed to provide. Therefore, the findings of this study cannot be extrapolated to the general population as no indication can be made of the characteristics of the non-responders. However, this study was not intended to give information for the use of the screener in the broad population, but for the validity of the pHQ9 as a screener in patients with Diabetes visiting specialized diabetes clinics. For this purpose, the findings are very relevant.
Also, the telephone response of almost 50% might seem low, but this is an average response rate on epidemiologic research in the Netherlands.
Thirdly, there could be up to a 2-week lag between administration of the PHQ-9 and the MINI. During this time, higher PHQ-9 scores might have "regressed to the mean" thus meaning higher cut-off points might have been needed to correspond to depressive disorder diagnoses than if the PHQ-9 and MINI had been administered more closely in time.
Our findings correspond with the results of other studies. A cut-off point of ≥10 was found to be the optimal cut-off point with high sensitivity and specificity scores (respectively 91% and 89% for MDD) in stroke patients . In a study in which depression was assessed in patients with traumatic brain injury, an optimal cut-off point of ≥10 with a sensitivity of 93% and a specificity of 89% was also reported . In several studies performed in the medically ill, an optimal cut-off score of 12 was recommended [12, 13]. One reason why we found that a higher cut-off point was optimal for patients with diabetes may be that the symptoms of MDD and diabetes can overlap.
Looking closer at our results, a cut-off point of ≥10 or ≥12 can be used, depending on the purpose of the screening. If the main purpose is screening for patients with depression in a clinical setting, a cut-off point of ≥12 can be recommended because of its higher specificity. The probability that a patient is falsely screened as depressed is then at an acceptable rate. On the other hand, a cut-off point of ≥10 is best for epidemiological research, because it ensures a larger group of participants with possible depressive disorders, probably ranging in severity. As a result, we recommend the summed score of the PHQ-9 for screening for depression in diabetes patients in specialized outpatient clinics. This is a reliable questionnaire which will subsequently result in improving the quality of the patient's life.
We would like to thank the participating hospitals of the ZGT Almelo and Hengelo for providing us with information. The study was financed by the Health Innovation Fund (Zorginnovatiefonds)in the Netherlands; this had no influence on the content of this article.
- Anderson RJ, Freedland KE, Clouse RE, Lustman PJ: The prevalence of comorbid depression in adults with diabetes: a meta-analysis. Diabetes Care. 2001, 24 (6): 1069-1078. 10.2337/diacare.24.6.1069.View ArticlePubMedGoogle Scholar
- Lloyd CE, Dyer PH, Barnett AH: Prevalence of symptoms of depression and anxiety in a diabetes clinic population. Diabet Med. 2000, 17 (3): 198-202. 10.1046/j.1464-5491.2000.00260.x.View ArticlePubMedGoogle Scholar
- Lustman PJ, Clouse RE: Depression in diabetic patients: the relationship between mood and glycemic control. J Diabetes Complications. 2005, 19 (2): 113-122.PubMedGoogle Scholar
- Kroenke K, Spitzer RL, Williams JB: The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001, 16 (9): 606-613. 10.1046/j.1525-1497.2001.016009606.x.View ArticlePubMedPubMed CentralGoogle Scholar
- Gilbody S, Richards D, Brealey S, Hewitt C: Screening for depression in medical settings with the Patient Health Questionnaire (PHQ): a diagnostic meta-analysis. J Gen Intern Med. 2007, 22 (11): 1596-1602. 10.1007/s11606-007-0333-y.View ArticlePubMedPubMed CentralGoogle Scholar
- Stafford L, Berk M, Jackson HJ: Validity of the Hospital Anxiety and Depression Scale and Patient Health Questionnaire-9 to screen for depression in patients with coronary artery disease. Gen Hosp Psychiatry. 2007, 29 (5): 417-424. 10.1016/j.genhosppsych.2007.06.005.View ArticlePubMedGoogle Scholar
- Katon W, Von Korff M, Ciechanowski P, Russo J, Lin E, Simon G, et al: Behavioral and clinical factors associated with depression among individuals with diabetes. Diabetes Care. 2004, 27 (4): 914-920. 10.2337/diacare.27.4.914.View ArticlePubMedGoogle Scholar
- Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, et al: The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998, 59 (Suppl 20): 22-33.PubMedGoogle Scholar
- Zweig MH, Campbell G: Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry. 1993, 39: 561-577.PubMedGoogle Scholar
- Williams LS, Brizendine EJ, Plue L, Bakas T, Tu W, Hendrie H, et al: Performance of the PHQ-9 as a screening tool for depression after stroke. Stroke. 2005, 36 (3): 635-638. 10.1161/01.STR.0000155688.18207.33.View ArticlePubMedGoogle Scholar
- Fann JR, Bombardier CH, Dikmen S, Esselman P, Warms CA, Pelzer E, et al: Validity of the Patient Health Questionnaire-9 in assessing depression following traumatic brain injury. J Head Trauma Rehabil. 2005, 20 (6): 501-511. 10.1097/00001199-200511000-00003.View ArticlePubMedGoogle Scholar
- Lowe B, Unutzer J, Callahan CM, Perkins AJ, Kroenke K: Monitoring depression treatment outcomes with the patient health questionnaire-9. Med Care. 2004, 42 (12): 1194-1201. 10.1097/00005650-200412000-00006.View ArticlePubMedGoogle Scholar
- Kendrick T, Dowrick C, McBride A, Howe A, Clarke P, Maisey S, et al: Management of depression in UK general practice in relation to scores on depression severity questionnaires: analysis of medical record data. Bmj. 2009, 338: b750-10.1136/bmj.b750.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://0-www.biomedcentral.com.brum.beds.ac.uk/1472-6963/10/235/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.