Canadian Primary Care Sentinel Surveillance Network Data Quality Issues Identified During Phase 1 and Our Initial Remediation Strategy
| Data Quality Issue | Description | Remediation Strategy |
|---|---|---|
| “Dirty data” | Misspelled words, extra words in field, inconsistent strings (“ex smoker,” “ex-smoker”), multiple diagnoses in a single field | Cleaned by data managers using synonym dictionaries and cleaning algorithms |
| Identifiable Data | Names, phone numbers, and other identifying information in diagnosis or reason for visit fields | Clean using a de-identification engine |
| Missing data | Dosages, dates of onset, occupation, ethnicity | Ask and train physicians and/or staff to enter appropriate data |
| Inconsistent data | Diagnoses stored in several different places—notes, PMH, problem list, Inconsistent Risk Factors coexisting—smoker, ex-smoker | Use physician as “gold standard” for confirming diagnoses; use dates to determine latest status of risk factor |
| Lacking Metadata | Referral to “Dr. Jones,” but Dr. Jones’ speciality is not listed | Work with EMR vendors to include specialty in address database; encourage staff to enter specialty into address databases |
| Inappropriate Metadata | Diagnosis not in problem list, medication in encounter notes | Ask physicians to enter “gold standard” diagnosis into Problem List for all patients with an index disease |
| Insufficient Meta Data | In 2 EMRs, Problem List, Risk Factors and Procedures appear in the same table with no metadata to distinguish the 3 types of data | Work with EMR vendors to separate the 3 different types of data |
| Lacking standardization | Multiple, changing, inconsistent names or results for lab tests, eg, HbA1C, glycosylated hemoglobin, hemoglobin A1C; 7% vs 0.07 for test results | Work with National standards bodies to encourage uptake of standards |
| Lacking data feeds | Lab results not coming in electronically | Encourage local labs to provide laboratory results electronically |
PMH, past medical history; EMR, electronic medical record.