Abstract:
Data available in the public domain are frequently aggregated to preserve confidentiality and to reduce a database to a manageable size. Drawing conclusions from such data may lead to inappropriate policy advice. The aims of this paper are to show how the aggregation of data to form rates may obscure important information and lead to misinterpretation of results. Suggestions are offered on ways in which this problem may be addressed. We also highlight the need to seek additional information in order to clarify findings. We used a case study approach by drawing on illustrative examples to highlight some problems encountered when using aggregated data about population. The focus is on health policy. Two types of problem were discussed in the cases chosen, but a common resolution was appropriate. In the first case policies based on the assumption that hospital admissions equate with disease incidence would be different from policies framed on actual incidence data. In the second, incidence rates changed when they were disaggregated to gender and age-specific rates. Policies formulated from analysis of aggregated data would be different form those based on disaggregated data. In the cases studied, the variables of gender, age and ethnicity influence incidence rates and must not be ignored. Researchers are recommended to study the data-set in the most disaggregated form available, and to check how data have been defined, collected and recorded, before preparing summary tables and graphs. Additional research or data from another source may be needed to clarify findings.