Skip to main content


We’d like to understand how you use our websites in order to improve them. Register your interest.

Accuracy of responses from postal surveys about continuing medical education and information behavior: experiences from a survey among German diabetologists



Postal surveys are a popular instrument for studies about continuing medical education habits. But little is known about the accuracy of responses in such surveys. The objective of this study was to quantify the magnitude of inaccurate responses in a postal survey among physicians.


A sub-analysis of a questionnaire about continuing medical education habits and information management was performed. The five variables used for the quantitative analysis are based on a question about the knowledge of a fictitious technical term and on inconsistencies in contingency tables of answers to logically connected questions.


Response rate was 52%. Non-response bias is possible but seems not very likely since an association between demographic variables and inconsistent responses could not be found. About 10% of responses were inaccurate according to the definition.


It was shown that a sub-analysis of a questionnaire makes a quantification of inaccurate responses in postal surveys possible. This sub-analysis revealed that a notable portion of responses in a postal survey about continuing medical education habits and information management was inaccurate.

Peer Review reports


Postal questionnaire surveys of physicians are a popular instrument to gather information [1, 2]. They are often used for studies about continuing-medical education (CME) habits and information management since they are relatively inexpensive and easy to handle [2]. A major problem of such surveys is the low response rate [1]. Besides this non-response bias [3, 4] there are a number of other factors which restrict conclusions from postal surveys. Alreck and Settle for example describe ten such sources of response bias [5]. But it seems that there are even more biases. Most important are: the tendency to socially desired responses (especially in surveys on sensible subjects like drug abuse or sexual habits) [6], acquiescence or the tendency for only yes- or no-responses [7, 8], failure in self-perception or (technically) inaccurate statements (e.g. because of willful lies or inaccurate memories) [911]. Most of the studies about the problem of potential biases are restricted to questionnaires for patients. Therefore a MEDLINE search revealed only articles dealing with the accuracy of statements by physicians in general, but there were no satisfactory results for articles about potential inaccuracies in postal surveys about CME habits or information management (search terms – MeSH: "Physicians", "Reproducibility of Results", "Questionnaire", "Bias", "Quality Control". Title words: "Questionnaire", "Postal Survey*", "Validity", "Bias", "Inaccura*", "Accura*") [1217]. This is also represented by the fact that authors of postal surveys in this field often do not discuss the problem of inaccuracies [e.g. [18, 19]] or if it is discussed it is not quantified [e.g. [20, 21]].

For this reason a sub-analysis of a questionnaire survey about CME and information habits of German diabetologists was performed. It should primarily determine how accurate the information in this study was and if the responses were credible. Furthermore I tried to evaluate if these inaccuracies could be attributed to the socially desired response bias.

The following report focuses on the sub-analysis and not on other results from the survey which are (partly) reported elsewhere [22].




The data used for this sub-analysis was collected by an explorative survey about information management and CME habits (for details see [22]). For this survey a new questionnaire had to be developed. Initially a preliminary questionnaire was developed considering three already published surveys [19, 20, 23]. It was discussed with members of the research group and sent to experts requesting comments (practicing diabetologists, experts in evidence-based medicine, technology assessment, survey methodology, and continuing medical education). After incorporating these comments the questionnaire consisted of 92 items. It can be divided into the following sections [22]: CME in general, therapeutic decision making and behavior of problem solving, use of databases, reading habits, knowledge of technical terms and critical appraisal, personal data. 3 of 92 items were asked open.

Sample and send out

The sample comprised of 461 diabetologists in the northern part of Germany. It was selected from a database of German diabetologists (Diabetologe DDG) (URL: The sample represented 29% of all 1585 diabetologists in the database. Sample size was calculated with regard to confidence intervals for estimated population frequencies (95% CI): a maximal margin of error for proportions of ± 6.25% for questions answerable dichotomously was considered narrow enough (i.e. the maximum width of the 95% CI for proportions should be 12.5% for questions with only two response categories e.g. yes/no). Given the population of 1585 diabetologists this required a sample size of 213. Response rates of prior surveys ranged from 50% to 70% which results in a sample size of at least 416 persons. For technical reasons it was not possible to draw a random sample. Therefore the sample was determined by the first figure of the zip code (code 1–3).

In October 2000 the questionnaire had been distributed for the first time. One week later a reminder postcard was sent to all participants and after three weeks a new questionnaire was sent to all non-respondents. A cover letter as well as a metered self-addressed envelope were enclosed. Coding by numbers for response control was explicitly mentioned but the analysis was fully anonymous.


Variables used

The analysis is mainly based on contingency tables of answers to logically connected questions. Non-consistent responses were denoted "positive". The following variables were used (see original questions in the additional file 1

  1. 1.

    Did respondents, who stated that systematic reviews/meta-analyses had a strong influence on their therapeutic decision making, report that they knew these two terms?

  2. 2.

    Did respondents, who stated that published clinical trials and systematic reviews/meta-analyses had a strong influence on their therapeutic decision making, report that they read these kind of articles?

  3. 3.

    A question about the knowledge of technical terms was asked (as suggested by McColl and colleagues [19]). A contingency table was created with answers to the term absolute risk reduction (ARR) and number needed to treat (NNT). Respondents who stated that they could explain the number needed to treat but could not explain absolute risk reduction were labeled positive (the number needed to treat is the reciprocal of the absolute risk reduction).

  4. 4.

    There was a question on the knowledge about a fictitious technical term (the McNemar-Quality-Scale; explanation of terms was not required). Respondents who stated that they knew this scale were labeled positive.

  5. 5.

    Did respondents, who stated that they appraised the scientific value of an article by evaluating its methods section (as suggested by Williamson and colleagues [20]) report that they read this section of an article?

Test for socially desired response bias

The assumption was that the tendency to socially desired responses would be the most dominant response bias in this survey. It was also presumed that this would be most prevalent in the question about technical terms. Two tests were used to support these assumptions:

  1. 1.

    A knowledge-score was calculated for each respondent using responses to the question about technical terms (All items/technical terms were included except the McNemar-Quality-Scale. Every cross at category: I understand this term and could explain it to others was valued with one point. Every cross at category: I have some understanding was valued with a half point. The sum was rounded. Therefore maximum score was twelve points). This knowledge-score was cross tabled with the positive answers of variable 5 (knowledge of the McNemar-Quality-Scale).

  2. 2.

    A contingency table with answers to the fictitious McNemar-Quality-Scale and the most unknown technical term was created. This term was the Alpha-error/Type-I-error. Only 50% (117/233) of all respondents knew this term.

Statistical Analysis

Descriptive statistics were mainly used. The chi2-test was used for comparison of categorical data (Yates continuity corrected for comparisons with 1 degree of freedom). Fisher's exact test was used if the expected cell values were less than 5. The mediantest was used for a comparison of the knowledge-scores because distributions were neither normal nor comparable [24]. Two-sided p-values < 0.05 were attributed as significant. Analyses were performed with the use of EpiInfo 2000, version 1.0.4 and KyPlot, version 2.0.


Of the 461 questionnaires distributed, 45 (10%) were returned because they were undeliverable. In this group the proportion of hospital-based physicians was significantly higher (33/45: 73% vs. 199/416: 48%; chi2 = 9.564; p = 0.002) whereas the proportion of practicing physicians was significantly lower (9/45: 20% vs. 187/416: 45%; chi2 = 9.349; p = 0.002) than in the remaining sample.

239 (52%) questionnaires were eligible for analysis (if undeliverable questionnaires are disregarded response rate is 57%). Table 1 compares the characteristics of the respondents, all German diabetologists (Diabetologen DDG), and the whole sample. Table 2 compares the respondents and non-respondents.

Table 1 Characteristics of respondents, sample, and all german diabetologists
Table 2 Characteristics of respondents and non-respondents

Understandability of the questionnaire was good. 56/235 (24%) of respondents stated questions were easy to understand, 160/239 (68%) found them rather easy to understand, and 19/235 (8%) found them rather difficult. Nobody found questions difficult to understand.

1. Knowledge of influential factors

15% (35/232) and 23% (53/230) respondents who stated that meta-analyses and systematic reviews respectively have a strong or very strong influence on their therapeutic decision making, had no or only a rough understanding of the meaning of these types of articles.

2. Reading and influence of different article types

The rates of respondents, who stated that the different article types have strong or very strong influence on their therapeutic decision making, but do not read these articles were very low. Rates were 3% (7/235) for clinical trials, 0% (1/235) for systematic reviews/meta-analyses, and for narrative reviews there was no discrepancy.

3./4. Knowledge of the NNT and a fictitious term

16% (38/234) could explain the number needed to treat but could not explain absolute risk reduction (Table 3). Overall 7% (17/234) of the respondents allegedly had at least some understanding of the McNemar-Quality-Scale of which one stated that he/she could explain this scale to others (24 respondents (10%) reported that they knew the scale but did not understand it).

Table 3 Knowledge of the number needed to treat and absolute risk reduction*

5. Examining and reading the methods section of articles

Table 4 shows whether respondents who reported that they evaluated article-quality by examining the methods section actually read this part of an article. 13% (22/172) of responses were contradictory if they were interpreted strictly. Categorized in two groups (always/often and seldom/never) 8% (14/172) of contradictory responses remained.

Table 4 Examining and reading the methods section of articles*

Test for socially desired response bias

The median knowledge-score in the group positive for variable 4 (knowledge of the McNemar-Quality-Scale) was 10 (IQR: 8–11.5; range: 6–12; mode 10 and 12). In comparison the median knowledge-score of the other respondents was 6 (IQR: 4–8; range: 0–12; mode: 6). This difference was significant (Fisher's exact test: p = 0.001). A comparison of responses to the terms McNemar-Quality-Scale and Alpha-Error/Type-I-Error revealed that only 2 (12%) of the 17 respondents allegedly knew both terms. The difference to the responses negative to the McNemar-Quality-Scale (2/17 vs. 115/216) was significant (chi2 = 9.249; p = 0.002).


Methodological issues

Because selection of the sample was not randomized systematic biases are possible. Demographic characteristics of all German diabetologists are subject to limited availability. Therefore an assessment of the representativity of the sample is restricted. The different proportion of general practitioners and pediatricians can be considered as bias. But whether this is of relevance for this analysis remains questionable (an association of positive responses and specialty could not be found; checked for variables 3, 4, and, 5; data not shown). The response-rate lies under the average of other surveys [25, 26]. But no major differences in the four available demographic characteristics could be detected between the respondents and the sample (Table 1). The relatively higher rate of undeliverable questionnaires among hospital-based physicians is certainly negligible since the number of persons is too small. Non-response bias may be another problem but its relevance seems as well questionable because an association between proportions of positive answers and sex, work place, or location of work place could not be found (checked for variables 3, 4, and 5; data not shown) (see also [22]). Nevertheless caution should be applied when generalizing the results of this survey and rates or numbers should be interpreted as a trend rather than at face value.

Another limitation lies in the methodology of this analysis. Since actual procedures of physicians were not observed (e.g. how they read journal articles) it is only possible to determine inaccuracies indirectly. Though it would be preferable to conduct such a study it is not feasible for practical reasons. Furthermore this analysis allows no extensive conclusions about the nature of the inaccuracies [27]. Although it was tried to evaluate the tendency for socially desired response it is not possible to definitely conclude which biases may contribute to the inaccurate responses. Qualitative methods would be needed for these kind of studies.

Interpretation of findings

1. Knowledge of influential factors

The rate of physicians who ascribed a high impact on their therapeutic decision making to factors not well known was very high with values of 15% and 23% respectively. These rates decreased to 2% and 4% respectively if one concedes that factors which were only roughly known can also have a strong influence.

2. Reading and influence of different article types

The rate of respondents who stated that published clinical trials had a strong influence on their therapeutic decision making but who read such articles only infrequently was very low. But it should be taken into account that virtually all surveyed physicians read this kind of articles always or often if they appear in journals they had subscribed (207/237; 87%).

3./4. Knowledge of the NNT and a fictitious term

The rate of respondents who allegedly could explain the number needed to treat but could not explain absolute risk reduction was very high. As McColl and colleagues did not perform an analysis like this a comparison between both studies is restricted. For such an analysis raw data are required. But the data in their publication indicate that there were also inconsistencies. In their survey 35% of respondents could explain the term number needed to treat but only 31% could explain the term absolute risk [19]. The alleged knowledge of the McNemar-Quality-Scale was lower than the knowledge of the NNT. But the value was also around 10%. One might argue that positive respondents confused the fictitious term with McNemar's statistical test or that they thought the researchers had been confused. But this seems not very likely since nobody during the development of the questionnaire referred to this potential problem. Moreover somebody who knows a statistical test would know the term Alpha- Error/Type-I-Error which was not the case in the majority of the positive respondents.

The proportion of inaccurate responses to this knowledge-question should be viewed as a very conservative estimate. A recently published study found that virtually nobody who stated that he/she allegedly understands the technical terms of the questionnaire developed by McColl et al. actually did so [28].

5. Examining and reading the methods section of articles

As for the other variables, the proportion of positive answers was about 10%.

Test for socially desired response bias

Given the other results of this analysis and the kind of response categories in this survey it seems reasonable to assume that the tendency for socially-desired responses would be the most prominent response bias. To attribute the alleged knowledge of the McNemar-Quality-Scale to the tendency for socially desired responses it must be interpreted in association with the knowledge-score and the responses to the most unknown term. If the knowledge-scores are low among those respondents who allegedly knew the McNemar-Quality-Scale, this response behavior can not be interpreted as socially desired. Other explanations have to be considered instead of. But the analysis showed that their knowledge-scores were well above the other respondents. This could lead to the conclusion that these 17 respondents (7%) have had a tendency for socially desired responses. Their knowledge of the Alpha-Error/Type-I-Error indicates on the other hand that these physicians were by all means willing to admit knowledge-gaps because they reported a lack of knowledge or understanding more frequently than the others. Therefore it seems unlikely that the inaccurate answers can be attributed to the socially-desired response bias.

Acceptance acquiescence [29] as another potential and important response bias is also unlikely due to the other findings of this analysis and the response categories in the questionnaire. The tendency for only yes- or no-responses can be ruled out as only 7 questions with a yes/no-response-category were asked. Thus it is believed that the inaccuracies in this survey are rather a problem of careless reading/answering (Which again might have been resulted from the long questionnaire or busy respondents although an association between the weekly hours of work and positive responses could not be found. Checked for variable 4; data not shown) or a failure in self-perception/overestimation of competency. Furthermore misunderstanding of questions or about specific terms might also have contributed to the inaccuracies as was shown in a recent study [28].


As a result of this analysis the proportion of inaccurate or illogical responses in a survey about CME habits and information management of physicians was around ten percent. Although some researchers try to correct such inaccuracies [11] it has to be determined how accurate such methods are.

It seems unlikely that respondents had a significant tendency for socially desired responses. The analysis indicates that it rather seems to be a problem of careless reading/answering of questions, a failure in self-perception or a misunderstanding about specific terms or questions. However in order to understand response biases and the processes involved qualitative studies are needed.

The method described is considered appropriate and feasible for evaluating the accuracy of responses in surveys but further research is necessary to validate it. It should be applied to future questionnaire surveys about CME habits and information management of physicians to enable appropriate assessments of such studies.


  1. 1.

    McAvoy BR, Kaner EFS: General practice postal surveys: a questionnaire too far?. BMJ. 1996, 313: 732-733.

  2. 2.

    Detlefson EG: The information behaviors of life and health scientists and care providers: characteristics of the research literature:. Bull Med Libr Assoc. 1998, 86: 385-390.

  3. 3.

    Kaner EFS, Haighton CA, McAvoy BR: 'So much post, so busy with practice – so, no time!': a telephone survey of general practitioners' reasons for not participating in postal questionnaire surveys. Br J Gen Pract. 1998, 48: 1067-1069.

  4. 4.

    Armstrong D, Ashworth M: When questionnaire response rates do matter: a survey of general practitioner and their views of NHS changes. Br J Gen Pract. 2000, 50: 479-480.

  5. 5.

    Alreck PL, Settle RB: The survey research handbook. Boston (Mass): Irwin/McGraw-Hill. 1995, 2

  6. 6.

    Embree BG, Whitehead PC: Validity and reliability of self-reported drinking behavior: dealing with the problem of response bias. J Stud Alcohol. 1993, 54: 334-344.

  7. 7.

    Knowles ES, Nathan KT: Acquiescent responding in self-reports: cognitive style or social concern?. Journal of Research in Personality. 1997, 31: 293-301. 10.1006/jrpe.1997.2180.

  8. 8.

    Guyatt GH, Cook DJ, King D, Norman GR, Kane SL, VanIneveld C: Effect of the framing of questionnaire items regarding satisfaction with training on residents' responses. Acad Med. 1999, 74: 192-194.

  9. 9.

    Wulff HR, Andersen B, Brandehoff P, Guttler F: What do doctors know about statistics?. Stat Med. 1987, 6: 3-10.

  10. 10.

    Covell DG, Uman GC, Manning PR: Information needs in office practice: are they being met?. Ann Intern Med. 1985, 103: 596-599.

  11. 11.

    Schmidt HG, VanDerMolen HT: Self-reported competency ratings of graduates of a problem-based curriculum. Acad Med. 2001, 76: 466-468.

  12. 12.

    Avorn J, Chen M, Hartley R: Scientific versus commercial sources of influence on the prescribing behaviour of physicians. Am J Med. 1982, 73: 4-8.

  13. 13.

    Curry L, Purkis IE: Validity of self-reports of behavior changes by participants after a CME course. J Med Educ. 1986, 61: 579-584.

  14. 14.

    Rosser WW, Palmer WH, on behalf of the Ontario Task Force On The Use And Provision Of Medical Services: Dissemination of guidelines on cholesterol. Can Fam Physician. 1993, 39: 280-284.

  15. 15.

    Saver BG, Taylor TR, Treadwell JR, Cole WG: Do physicians do as they say? The case of mammography. Arch Fam Med. 1997, 6: 543-546. 10.1001/archfami.6.6.543.

  16. 16.

    Wennberg DE, Dickens JD, Biener L, Fowler FJ, Soule DN, Keller RB: Do physicians do what they say? The inclination to test and its association with coronary angiography rates. J Gen Intern Med. 1997, 12: 172-176. 10.1046/j.1525-1497.1997.012003172.x.

  17. 17.

    Adams AS, Soumerai SB, Lomas J, Ross-Degnan D: Evidence of self-report bias in assessing adherence to guidelines. Int J Qual Health Care. 1999, 11: 187-192. 10.1093/intqhc/11.3.187.

  18. 18.

    Fletcher P: Continuing medical education in a district general hospital: a snapshot. Med Educ. 2001, 35: 967-972. 10.1046/j.1365-2923.2001.01025.x.

  19. 19.

    McColl A, Smith H, White P, Field J: General practitioners' perceptions of the route to evidence based medicine: a questionnaire survey. BMJ. 1998, 316: 361-365.

  20. 20.

    Williamson JW, Pearl SG, Weiss R, Skinner EA, Bowes F: Health science information management and continuing education of physicians: a survey of US primary care practitioners and their opinion leaders. Ann Intern Med. 1989, 110: 151-160.

  21. 21.

    Saint S, Christakis DA, Saha S, Elmore JG, Welsh DE, Baker P, Koepsell TD: Journal reading habits of internists. J Gen Intern Med. 2001, 15: 881-884. 10.1046/j.1525-1497.2000.00202.x.

  22. 22.

    Trelle S: Information management and reading habits of German diabetologists: a questionnaire survey. Diabetologia. 2002, 45: 764-774. 10.1007/s00125-002-0807-8.

  23. 23.

    Deibler G: Fortbildungsverhalten niedergelassener Internisten im Raum Suedwuerttemberg [dissertation]. Tuebingen: Eberhard-Karls Universitaet; [German]. 1995

  24. 24.

    Bortz J, Lienert GA, Boehnke K: Verteilungsfreie Methoden in der Biostatistik. Berlin: Springer; [German]. 2000, 2

  25. 25.

    Cummings SM, Savitz LA, Konrad TR: Reported response rates to mailed physician questionnaires. Health Serv Res. 2001, 35: 1347-1355.

  26. 26.

    Asch DA, Jedrziewski MK, Christakis NA: Response rates to mail surveys published in medical journals. J Clin Epidemio. 1997, 50: 1129-1136. 10.1016/S0895-4356(97)00126-1.

  27. 27.

    Nunnally JC, Bernstein IH: Psychometric theory. New York: McGraw-Hill. 1994, 3

  28. 28.

    Young JM, Glasziou P, Ward JE: General practitioners' self ratings of skills in evidence based medicine: validation study. BMJ. 2002, 324: 950-951. 10.1136/bmj.324.7343.950.

  29. 29.

    Jackson DN: Acquiescent response styles: problems of identification and control. In: Response set in personality assessment. Edited by: Berg IA. 1967, Chicago, Aldine, 71-114.

Pre-publication history

  1. The pre-publication history for this paper can be accessed here:

Download references


I am indebted to R. Kollek for the supervision of this work and the willingness to support my dissertation. I am also grateful to Ch. Trelle and E. Kaner for their critical comments on a previous version of this paper.

Author information



Corresponding author

Correspondence to Sven Trelle.

Additional information

Competing interests

None declared.

Authors' contributions

Sven Trelle conceived, designed, conducted, analyzed, and wrote the study.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Trelle, S. Accuracy of responses from postal surveys about continuing medical education and information behavior: experiences from a survey among German diabetologists. BMC Health Serv Res 2, 15 (2002).

Download citation


  • Response Bias
  • Postal Survey
  • Technical Term
  • Absolute Risk Reduction
  • Article Type