Abstract
Purpose: Develop and validate simple risk scores based on initial clinical data and no or minimal laboratory testing to predict mortality in hospitalized adults with COVID-19.
Methods: We gathered clinical and initial laboratory variables on consecutive inpatients with COVID-19 who had either died or been discharged alive at 6 US health centers. Logistic regression was used to develop a predictive model using no laboratory values (COVID-NoLab) and one adding tests available in many outpatient settings (COVID-SimpleLab). The models were converted to point scores and their accuracy evaluated in an internal validation group.
Results: We identified 1340 adult inpatients with complete data for nonlaboratory parameters and 741 with complete data for white blood cell (WBC) count, differential, c-reactive protein (CRP), and serum creatinine. The COVID-NoLab risk score includes age, respiratory rate, and oxygen saturation and identified risk groups with 0.8%, 11.4%, and 40.4% mortality in the validation group (AUROCC = 0.803). The COVID-SimpleLab score includes age, respiratory rate, oxygen saturation, WBC, CRP, serum creatinine, and comorbid asthma and identified risk groups with 1.0%, 9.1%, and 29.3% mortality in the validation group (AUROCC = 0.833).
Conclusions: Because they use simple, readily available predictors, developed risk scores have potential applicability in the outpatient setting but require prospective validation before use.
Introduction
The COVID-19 pandemic caused by the SARS-CoV-2 virus, named COVID-19, has created an unprecedented health crisis. There have been more than 10 million confirmed cases and more than 500,000 deaths worldwide,1 with an estimated 10 undetected cases per confirmed case.2 The case fatality rate is estimated to be approximately 0.5 to 1.0%, approximately 5 to10 times higher than seasonal influenza, with older patients having much higher case fatality rates.3 The spectrum of illness is broad, ranging from completely asymptomatic carriers to those with critical illness and death. This breadth of presentation makes optimal disposition difficult at the time of initial presentation, because the clinical presentation may not correlate with the patient’s actual risk of a bad outcome.
A major concern is that hospital beds and in particular intensive care unit (ICU) beds and mechanical ventilators may be overwhelmed when cases rise in an area. This makes it critical that physicians have the tools needed to identify patients both at lower and elevated mortality risk at the time of initial presentation. An accurate risk assessment tool using simple parameters available on presentation to the emergency department and other settings could aid clinicians in rapidly making optimal patient disposition decisions. For patients who are hospitalized, it could guide the intensity of monitoring and the initial admission location (hospital ward, telemetry, or ICU). If validated in the outpatient setting, it could also guide hospitalization decisions. Key risk factors for mortality have been identified and include increasing age, male sex, comorbidities, and certain laboratory parameters.3⇓–5 Systematic review of laboratory parameters found that lymphopenia and elevated levels of c-reactive protein (CRP), neutrophil count, interleukin-6, d-dimer, lactate dehydrogenase, and troponin I were all associated with a poor outcome in hospitalized COVID-19 patients.6,7
Researchers have attempted to develop prediction models for poor prognosis in COVID-19 patients, combining demographic, comorbidity, physical examination, laboratory, and imaging predictors into multivariate models. In some cases, these have been simplified into clinical prediction rules (CPRs) or online calculators.8⇓⇓–11 However, many have not been externally validated, and none have been externally validated in a US population. In addition, many of these CPRs or models use laboratory tests and imaging that would not readily allow their extension to primary care or urgent care settings.10⇓–12 As more COVID-19 patients are managed via telehealth, having a CPR that can be applied early in the disease course and that does not rely on any laboratory testing would be desirable to avoid having to bring low-risk patients to a laboratory or outpatient office for an in-person visit.
Therefore, the primary goal of the current study is to develop and validate 2 simple CPRs to predict COVID-19 mortality risk, 1 that relies only on nonlaboratory parameters (COVID-NoLab) and another that adds simple laboratory tests commonly available in primary or urgent care settings (COVID-SimpleLab). As the goal is to decide decision making on initial presentation, only data from the first 24 hours will be used to develop the CPRs. To accomplish this, we used data from a diverse multicenter US population of adults hospitalized with COVID-19. Secondarily, we will use this population’s data to evaluate several previously developed risk scores for COVID-19 prognosis.
Methods and Materials
Study Organization
The lead investigator (MHE) identified colleagues at 6 major US universities (University of Wisconsin–Madison, Penn State University, University of Florida, Virginia Commonwealth University, University of California at Los Angeles, and Georgetown University) with inpatient health centers to participate in a study of COVID-19 prognosis. Each site obtained Institutional Review Board (IRB) approval for this project, which was deemed to be exempt research due to using deidentified, previously collected patient data extracted retrospectively from each health system’s electronic health record. Data use agreements were established between each university and the University of Georgia. The overall project was approved by the University of Georgia IRB.
Data Collection
A standardized data set of demographic, clinical, and laboratory parameters was assembled using extant literature and with input from the group (Appendix 1). Comorbidities were defined using Clinical Classifications Software categories for the following disease clusters: cardiovascular disease (CCS 101), chronic obstructive pulmonary disease (CCS 127), asthma (CCS 128), and diabetes mellitus (CCS 49).13 Inclusion criteria included any adult inpatient with a positive polymerase chain reaction test for COVID-19 hospitalized at one of the participating institutions whose disposition was already determined (discharged or deceased) at the time of data extraction. The primary outcome was in-hospital mortality. We also conducted exploratory analyses for prediction of the combined outcome of death or need for mechanical ventilation.
Each site was responsible for its own data extraction from its electronic health record, following the standardized approach to each variable definition. Gender, age, and predictor variable were collected. Because the goal is to be able to predict prognosis at admission, only predictor variables available within 24 hours of admission date/time were included. Each patient’s extracted data were deidentified at the collection site. As age over 90 could be considered identifying, patients aged 90 years or over had their age listed as 90. Each center had a different range of dates for data collection, beginning as early as March 1, 2020 and extending as far as June 12, 2020. Deidentified data were securely transferred from each institution to a central repository at the University of Georgia, where they were combined for analysis.
Validation of Existing Clinical Risk Scores
The lead investigator’s systematic review of individual risk factors, risk scores, and prognostic models to predict critical illness or death in patients with COVID-19 (manuscript in review) was used to identify 2 simple risk scores9,11 and a simple multivariate model10 for COVID-19 mortality in the literature (all in inpatients). For each patient with the predictor variables in the risk score, the score was calculated. The proportion of patients with the outcome of interest (eg, death) in each risk group and where possible the area under the receiver operating characteristic curve (AUROCC) were calculated for each score or model.
Development and Internal Validation of Novel Risk Scores Using Our Data Set
Continuous variables were presented as the median and interquartile range, and categorical variables were presented as frequencies and percentages of occurrence. For the univariate analysis, the bivariate associations between predictor variables and mortality were assessed using the chi squared test for categorical variables and Wilcoxon rank-sum test for continuous variables.
We then randomly divided the data into derivation and validation groups with a ratio of 60:40 and built logistic regression models in the derivation set with in-hospital mortality as the outcome or dependent variable. In the first model, we only considered the patient’s age, comorbidities, and vital signs (including oxygen saturation) as independent predictors. In the second model, we added the white blood cell (WBC) count, white cell differential, serum creatinine, and CRP to the models. Imputation of laboratory data were considered, but given the large number of missing cases, we performed complete case analyses. Continuous variables were converted to categorical variables to simplify calculations in the final risk score based on inspection of histograms. We used stepwise backward selection with P < .1 for retention in the model.14 Once the predictors were selected, β coefficients were determined from the final multivariable logistic regression model. We then created a simple point score by dividing each β coefficient by the smallest β value and rounded it to the nearest integer. The low-risk, moderate-risk, and high-risk groups were created based on visual inspection of the point score distribution to create groups that would be most useful for clinical decision making, with a particular goal of having the low-risk group be at or near 1% mortality.
The performance of the point scores was internally validated using the validation data set. This included evaluation of how accurately the score classified patients into low-, moderate-, and high-risk groups. We used the Hosmer–Lemeshow test and a calibration curve to evaluate calibration, which indicates how well predicted mortality matched observed mortality. The AUROCC was used as a measure of overall discrimination.
Results
Characteristics of the Study Population
The characteristics of the study population are summarized in Table 1, stratified by health system. The number of patients available for analysis at each center ranged from 69 to 582, and the mortality rate ranged from 1.4% to 16.7%, with an overall mortality rate of 13.1%. The median age of participants at the 6 sites ranged from 52 to 62 years; there was a slight male preponderance.
The bivariate analysis of the association between clinical variables and mortality is shown in Table 2. Nonlaboratory parameters positively associated with mortality (P < .05) included increasing age, several comorbidities (cardiovascular disease, diabetes mellitus, and chronic obstructive pulmonary disease), increased body mass index, decreased oxygen saturation, and increased respiratory rate. Laboratory parameters positively associated with mortality included increased CRP, WBC count, neutrophil count, serum creatinine, and decreased lymphocyte count.
Development and Validation of Simple Risk Scores
Table 3 summarizes the 2 multivariate models to predict COVID-19 mortality using basic data available at initial presentation. Complete case data were available for 1342 patients for the COVID-NoLab model and 741 for the COVID-SimpleLab model. The COVID-NoLab model had an AUROCC of 0.771 in the derivation group and 0.803 in the validation group. The COVID-SimpleLab model had an AUROCC of 0.835 in the derivation group and 0.833 in the validation group.
Calibration in the validation groups was good based on visual inspection of calibration plots, with nonstatistically significant values for the Hosmer–Lemeshow goodness of fit test (P = .759 for the COVID-NoLab model and P = .400 for the COVID-SimpleLab model). The receiver operating characteristic (ROC) curves and calibration plots for each model are shown in Appendix 2.
The COVID-NoLab and COVID-SimpleLab risk scores were created based on the derivation set data, using β-coefficients as described above. The COVID-NoLab and COVID-SimpleLab risk scores and their classification accuracy are summarized in Table 4 for the derivation and validation groups for each risk score. Both simple risk scores had similar classification accuracy in the derivation and validation groups. However, the score that adds simple laboratory tests classifies a higher percentage of patients as low risk (29% vs 21% in derivation and 33% vs 24% in validation) who could potentially be managed as outpatients. It also classifies more patients as high risk who will require closer monitoring or intensive care (29% vs 12% in derivation and 34% vs 11% in validation).
Models were also developed and internally validated for settings where only the WBC count might be available, or only the CRP test. These models’ risk scores are summarized in Appendix 3. Although both models were able to identify high-risk patients, in each case the low-risk group in the validation data sets had an appreciably higher mortality rate than in the derivation data (4.4% vs 0.0% for both models). Their calibration was good, based on visual inspection of the calibration plots and the Hosmer–Lemeshow test.
Evaluation of Previous Risk Scores
We evaluated 3 existing simple models for predicting COVID-19 mortality. Five clinical variables were included in the 3 tools: age, CRP, lactate dehydrogenase, lymphopenia, and oxygen saturation. Two tools used classification trees and had not been externally validated,9,11 and 1 was a simple multivariate model that had been validated at a single Chinese hospital.15 We were unable to evaluate the accuracy of other risk scores due to either the unavailability of some of the predictors in our data set or because they predicted outcomes other than mortality.8 The performance of each of the 3 prediction models in the US study population is summarized in Table 5.
Discussion
We have developed and internally validated 2 simple CPRs, 1 of which requires no laboratory testing (COVID-NoLab) and another that only requires clinical variables plus simple laboratory tests that are commonly and rapidly available in many outpatient settings (COVID-SimpleLab). The score that includes simple lab tests classifies more patients as low or high risk and is therefore potentially more clinically useful. Previous risk scores have either not been internally validated, have not been validated in the United States, or have required tests not commonly available in outpatient settings such as procalcitonin, lactate dehydrogenase, or chest radiography. Our risk scores performed well in an internal validation, although external, prospective validation in other populations would be desirable. The risk scores are simple enough for clinicians to memorize or keep on a pocket card. In the future, they could be made available as a mobile app for point-of-care use or integrated into electronic health records.
The COVID-NoLab score has important potential utility in the telehealth setting, which has become a common venue for assessing and monitoring COVID-positive patients while minimizing the risk of viral transmission to clinical staff. Although the score does require an oxygen saturation level, patients with COVID-19 are increasingly being given devices for home assessment of oxygen saturation as a way to remotely monitor their symptoms. Our study reinforces the value of knowing this parameter as a way to predict mortality risk and, potentially, health decline. Our findings, although not yet conclusive, may encourage innovative health systems to consider home oxygen saturation as a means to safely manage COVID-infected patients at home. For example, one could have patients measure oxygen saturation twice daily, have a daily telehealth visit with a health care professional who could evaluate respiratory rate, and recalculate the risk score daily. It is also something that could be used by emergency response personnel when evaluating patients in the field, where blood tests are not available but oxygen saturation monitors are readily available.
The COVID-SimpleLab risk score was somewhat more accurate than the COVID-NoLab risk score and is appropriate for outpatient settings where the WBC count, CRP, and serum creatinine are available. We also developed risk scores that included only clinical variables and either WBC or CRP, because outpatient settings around the world often have different tests available. For example, although the WBC is often available in the US primary care setting at the point of care, CRP is rarely available. On the other hand, the opposite is true in many European countries.16,17 Although the “Clinical + WBC” and “Clinical + CRP” risk scores did not perform as well in validation, particularly at identifying a very low-risk group, they should still be prospectively validated in lower-risk outpatients with COVID-19 to see if they perform better in that population.
The previously reported risk scores originally developed in Chinese populations9⇓–11 were less accurate in our US population. This may be because of overfitting of the early models, differences in the spectrum of illness, or differences between the health care systems in China and the United States. In addition, these models were developed early in the pandemic when mortality rates were higher.
We hope to work with investigators at other institutions to evaluate the COVID-NoLab and COVID-SimpleLab models in their populations. We only gathered data on 4 comorbidities and in the future would want to explore adding other clinical variables such as hypertension, chronic liver disease, and tobacco use. It would be preferable to use prospective data collection and add patient symptoms such as dyspnea, although respiratory rate and oxygen saturation measurements may covary with dyspnea, making it less important. Including patients identified in a range of settings and managed as outpatients will be important. Finally, this work should be ongoing, because as treatments will hopefully improve, the prognosis will change and predictive models will require updating.
Strengths and Limitations
An important strength of this study is that our model was developed using data from 6 geographically diverse sites in the United States, sites that serve racially and ethnically diverse populations. Further, by generating risk scores that use either no laboratory variables or limited laboratory testing, if appropriately validated our results could potentially be useful in outpatient settings or in telehealth to guide decisions regarding the need for admission or the intensity of outpatient follow-up that is needed. The risk scores are also quite simple and have good face validity, making them practical for busy clinical settings.
Our study has several limitations. This is a convenience sample, and we only included data for patients who had been discharged alive or who died. Thus, patients still in the hospital were not included; this may bias the sample. Importantly, the data collected is restricted to COVID-19 patients in an inpatient setting who have a narrower and more severe spectrum of illness than patients managed at home without hospitalization. Thus, our work requires validation in other populations, including primary care and urgent care settings, before clinical application in outpatients. Changes in the virus itself and changes in treatment may also affect prognosis over time, so any risk score may eventually require updating or recalibration. Finally, we used a split-sample internal validation, which may inflate calibration, and the model should be prospectively validated before adoption by clinicians.
Conclusion
The COVID-NoLab and COVID-SimpleLab scores derived in a large, diverse population of hospitalized COVID-19 patients in the United States had good discrimination, calibration, and classification accuracy using an internal validation (split-sample) approach. If validated in a new population of hospitalized patients, they provide a rapid, simple way to determine prognosis for hospitalized patients and identify a low-risk group that could be considered for outpatient management in a bed shortage, for example. Because they were designed to use no or minimal laboratory tests, these risk scores may also be generalizable to outpatient settings. This could potentially provide clinicians a useful aid for decision making regarding hospital admission and the intensity of outpatient follow-up. However, it is important that the risk scores be prospectively validated in the outpatient setting before its use there.
Appendix 1. Full List of Requested Clinical Variables; Predictor Variables only Included if Ordered Within 24 Hours of Admission
Appendix 2. This Summarizes the Receiver Operating Characteristic (ROC) Curves and Calibration Plots for Each Model.
Model using clinical predictors only (COVID-NoLab)
Derivation data set
Receiver operating characteristic (ROC) curve
Calibration plot
Number of observations = 1343
Number of groups = 5
Hosmer–Lemeshow chi2(3) = 2.34
Prob > chi2 = 0.5051
Validation data set
ROC curve
Calibration plot
Number of observations = 537
Number of groups = 7
Hosmer–Lemeshow chi2(5) = 2.62
Prob > chi2 = 0.7590
Model using clinical + complete blood count + c-reactive protein + creatinine (COVID-SimpleLab)
Derivation data set
ROC curve
Calibration plot
Number of observations = 445
Number of groups = 10
Hosmer–Lemeshow chi2(8) = 10.07
Prob > chi2 = 0.2601
Validation data set
ROC curve
Calibration plot
Number of observations = 295
Number of groups = 9
Hosmer–Lemeshow chi2(7) = 7.29
Prob > chi2 = 0.3998
Appendix 3. Additional Model Using Clinical Variables and Complete Blood Count Only.
Model using clinical variables + complete blood count only
Logistic regression model
Derivation data set
Receiver operating characteristic (ROC) curve
Calibration plot
Number of observations = 706
Number of groups = 7
Hosmer–Lemeshow chi2(5) = 4.61
Prob > chi2 = 0.4649
Validation data set
ROC curve
Calibration plot
Number of observations = 470
Number of groups = 8
Hosmer–Lemeshow chi2(6) = 5.30
Prob > chi2 = 0.5055
Derivation data set
ROC curve
Calibration plot
Number of observations = 475
Number of groups = 9
Hosmer–Lemeshow chi2(7) = 1.62
Prob > chi2 = 0.9777
Validation data set
ROC curve
Calibration plot
Number of observations = 316
Number of groups = 9
Hosmer–Lemeshow chi2(7) = 1.73
Prob > chi2 = 0.9733
Notes
This article was externally peer reviewed.
Funding: None.
Conflict of interest: None.
To see this article online, please go to: http://jabfm.org/content/34/Supplement/S127.full.
- Received for publication September 4, 2020.
- Revision received October 19, 2020.
- Accepted for publication December 3, 2020.