Table 1.

Canadian Primary Care Sentinel Surveillance Network Data Quality Issues Identified During Phase 1 and Our Initial Remediation Strategy

Data Quality IssueDescriptionRemediation Strategy
“Dirty data”Misspelled words, extra words in field, inconsistent strings (“ex smoker,” “ex-smoker”), multiple diagnoses in a single fieldCleaned by data managers using synonym dictionaries and cleaning algorithms
Identifiable DataNames, phone numbers, and other identifying information in diagnosis or reason for visit fieldsClean using a de-identification engine
Missing dataDosages, dates of onset, occupation, ethnicityAsk and train physicians and/or staff to enter appropriate data
Inconsistent dataDiagnoses stored in several different places—notes, PMH, problem list, Inconsistent Risk Factors coexisting—smoker, ex-smokerUse physician as “gold standard” for confirming diagnoses; use dates to determine latest status of risk factor
Lacking MetadataReferral to “Dr. Jones,” but Dr. Jones’ speciality is not listedWork with EMR vendors to include specialty in address database; encourage staff to enter specialty into address databases
Inappropriate MetadataDiagnosis not in problem list, medication in encounter notesAsk physicians to enter “gold standard” diagnosis into Problem List for all patients with an index disease
Insufficient Meta DataIn 2 EMRs, Problem List, Risk Factors and Procedures appear in the same table with no metadata to distinguish the 3 types of dataWork with EMR vendors to separate the 3 different types of data
Lacking standardizationMultiple, changing, inconsistent names or results for lab tests, eg, HbA1C, glycosylated hemoglobin, hemoglobin A1C; 7% vs 0.07 for test resultsWork with National standards bodies to encourage uptake of standards
Lacking data feedsLab results not coming in electronicallyEncourage local labs to provide laboratory results electronically
  • PMH, past medical history; EMR, electronic medical record.