Abstract
Background: There is a growing patient population using yoga as a therapeutic intervention, but little is known about how yoga interfaces with health care in clinical settings.
Purpose: To characterize how yoga is documented at a large academic medical center and to systematically identify clinician-derived therapeutic use cases of yoga.
Methods: We designed a retrospective observational study using a yoga cohort (n = 30,976) and a demographically matched control cohort (n = 92,919) from the electronic health records at Penn Medicine between 2006 and 2016. We modeled the distribution of yoga notes among patients, clinicians, and clinical service departments, built a multinomial Naïve Bayes classifier to separate the notes by context-dependent use of the word yoga, and modeled associations between clinician recommendations to use yoga and 754 diagnostic codes with Fisher's exact test, setting an false discovery rate (FDR)-adjusted P-value ≤ .05 (ie, q-value) as the significance threshold.
Results: Yoga mentions in the electronic health record have increased 10.4-fold during the 10-year study period, with 2.6% of patients having at least 1 mention of yoga in their notes. In total, 30,976 patients, 2398 clinicians, and 41 clinical service departments were affiliated with yoga notes. The majority of yoga notes are in primary care. Nine diagnoses met the significance criteria for having an association with clinician recommendations to use yoga including Parkinson's disease (Odds ratio [OR], 6.3 [3.7 to 11.4]; q-value < 0.001), anxiety (OR, 5.8 [3.8 to 9.0]; q-value < 0.001), and backache (OR, 3.8 [2.4 to 6.3]; q-value = 0.001).
Conclusions: There is a widespread and growing trend to include yoga as part of the clinical record. In practice, clinicians are recommending yoga as a nonpharmacological intervention for a subset of common chronic diseases.
- Academic Medical Centers
- Behavioral Medicine
- Chronic Disease Management
- Electronic Health Records
- Life Style
- Medical Informatics
- Meditation
- Primary Health Care
- Yoga
Patients are turning to complementary therapies (eg, yoga, meditation, Mindfulness-Based Stress Reduction, etc.) in increasing numbers in an effort to prevent or treat disease and to manage symptoms, drug side-effects, and chronic stress.1,2 The use of complementary therapies in addition to the standard of care is known as integrative medicine, a field that continues to gain traction despite the fact that most published research is constrained by small sample sizes, inadequate protocol descriptions, and the absence of quantitative or objective outcome measures.
Studies in this domain present a challenging research design problem. They are not amenable to customary double-blind placebo-controlled trials, they are conceptually and practically difficult to dose, and the effects are highly subjective.3 These challenges spotlight the need for data that captures how complementary therapies interface with health care, not in a controlled setting, but in the real world.
A potentially abundant source of such data is the electronic health record (EHR). Leveraging social and behavioral determinants of health from EHR data has promising implications for population-based research and public health.4⇓–6 As it stands, social and behavioral factors are captured in the clinical chart notes. Chart notes mentioning integrative medicine represent the practical experiences of patients and providers as a type of expert knowledge that does not rely on recall which can bias survey data. Through text-mining and natural language processing, this information becomes a valuable source of empirical data to begin filling evidence gaps and to establish effective use cases for integrative medicine.
Yoga, a practice based on controlled breathing, movement, and meditation, is the most commonly used type of integrative medicine. An estimated 1 in 7 American adults used yoga in 2017, according to the Centers for Disease Control and Prevention.2 When surveyed, 81.9% of yoga practitioners reported overall improvements in physical and mental health.7 Moreover, systematic reviews suggest yoga promotes healthy aging and may be an effective treatment strategy for patients that have chronic diseases with stress or anxiety-related comorbidities.8⇓–10
Here, we present a retrospective observational study in the EHR that characterizes how yoga is documented in our health care system, and identifies clinician-derived therapeutic use cases of yoga.
Methods
Data Collection
We identified study and control cohorts by querying the EHR at the University of Pennsylvania Health System. First, we used PennSeek, a tool that aids unstructured text search in the EHR, to identify patients with the word “yoga” in their outpatient chart notes between November 15, 2006 and November 16, 2016. Then, using the unique identifiers for each of these patients, we queried the Penn Data Store, a database housing structured EHR data, to retrieve associated demographic and diagnostic information. We combined the structured and unstructured data to generate a yoga cohort of 30,976 patients.
We also identified a set of demographically matched controls from the Penn Data Store by randomly sampling outpatient records to match the yoga cohort based on age, sex, and race. We used an approximate 3:1 control-to-case ratio to increase statistical power and coverage across unknown or unmeasured confounders.11 The control cohort contains 92,919 patients.
For each data set, the inclusion criteria were at least 1 encounter from an outpatient office or allied health visit between November 15, 2006 and November 16, 2016, at least 1 primary diagnostic code, and age ≥ 19 years. All patients are represented in the data set by a single (most recent) encounter. For the yoga cohort, this is the most recent encounter with a chart note mentioning yoga. To capture a snapshot of the entire Penn Medicine patient population during the study period, we also collected demographic data on all patients meeting the inclusion criteria above (n = 1,210,228).
All data were accessed through the Data Analytics Center at Penn Medicine and the study protocol was approved by the Institutional Review Board (IRB) at the University of Pennsylvania.
Demographics
To report demographics (age, sex, race, ethnicity, and financial class) we used data from the most recent encounter for each patient. Statistical comparisons between groups were calculated by the t-test for the continuous variable, age, and the χ2 test for the categorical variables: sex, race, ethnicity, and financial class.
Diagnostic Codes
Medical conditions were recorded with the International classification of diseases versions 9 and 10 (ICD-9 and ICD-10). We first used the ICD-9-CM and ICD-10-CM general equivalence maps from the AHRQ MapIT Software (Rockville, MD) to create a single unified map from ICD code to medical condition and then we collapsed disease name synonyms with the National Center for Biomedical Ontology (NCBO) human disease ontology.12⇓–14
The medical conditions associated with an encounter are filtered to include only the ICD codes given a primary diagnosis designation.
Classification
We built a text classifier to label the context of the word yoga in each chart note.
Feature Set
To generate text features that can be used to predict class, we applied a standard natural language-processing pipeline to the chart notes. First, we preprocessed the text to remove symbols, punctuation, and case before breaking down the paragraphs into individual words. The tense of the words was not modified by stemming or lemmatization and stop words (eg, the, and, or, etc.) were retained. Second, to represent each note, we selected the word “yoga” and a given number words on either side of yoga to generate a context-window. We used context-windows to reduce noise because most records have a single mention of the word yoga among hundreds of words of text unrelated to yoga. Third, from the context-windows, we generated ngrams of length 1, 2, and 3, and skipgrams of length 2. Here, ngrams are contiguous sequences of words and skipgrams are constructed by pairing the first and third word in a contiguous sequence. We required each ngram and skipgram to appear in at least 2 chart notes, and excluded those that appeared in greater than 95% of chart notes. The ngrams and skipgrams from all the chart notes were combined to create a single vocabulary. Text processing was done in Python version 3.6.1 with the Natural Language Toolkit (NLTK version 3.2.4), a free, open-source platform for analyzing human language data.15
Classifier
To categorize the chart notes based on the context of the word yoga, we built a multinomial Naïve Bayes classifier. Multinomial Naïve Bayes is a supervised classifier used to predict the probability of a note belonging to a given class, based on its features (ie, ngrams and skipgrams). These models can handle large sets of features including irrelevant features and features with equal predictive probabilities, and they are fast. For our text classification problem, the multinomial Naïve Bayes model outperformed support vector machines, random forests, and gradient-boosting models.
The model was trained, tuned, and tested on a subset of manually annotated chart notes (n = 5600). In the annotation process, we identified 3 classes: clinician-documented yoga, clinician-recommended yoga, and other miscellaneous mentions of the word yoga. Each note is assigned 1 of these 3 class labels. The model is trained to identify patterns between the features and the class labels by iteratively evaluating subsets of the annotated data. To address class imbalance, we used stratified 5-fold cross validation to train the model and to optimize hyperparameters for feature selection and modeling. Eighty percent of the annotated notes were used to train the model (training set), the other 20% were used to test the performance of the model (test set). To evaluate the performance of the model, we calculated balanced accuracy in the test set. In this case, balanced accuracy is the weighted average of the sensitivity score, that is, the true positive rate, for each class. The final model uses mutual information for feature selection, retains features with scores in the top 50% percentile, and assigns the α parameter in the multinomial Naïve Bayes model to 1.0, which is incidentally the default. Classification was done in Python version 3.6.1 with Scikit-learn (sklearn version 0.18.1),16 a free, open-source software library for machine learning.
Statistical Analyses
To characterize trends over time in the numbers of chart notes mentioning yoga, and the corresponding numbers of patients, clinicians, and clinical service departments, we fit linear and quadratic models to the unique number of yoga notes by year. Quadratic models were selected because they minimized the Akaike Information Criterion.17 To characterize trends over time in the adoption of yoga as part of the clinical record by clinical service department, we fit linear models to the unique number of yoga notes recorded in a given department by year.
To identify medical conditions associated with clinician recommendations to use yoga, we calculated the odds of having a given ICD code and a yoga recommendation at the same clinical encounter. To do this, we used Fisher's exact test to calculate odds ratios and P-values based on the counts of patients with a given ICD code in the clinician-recommended yoga class relative to those of their demographically matched controls. To correct for multiple testing, the significance threshold for inclusion was an false discovery rate (FDR)-adjusted P-value (ie, q-value) of ≤ 0.05. To ensure we did not select for special cases, we filtered the statistically significant results to include only medical conditions for which there were at least 25 patients and 5 clinicians in the yoga class.
Statistical analyses were performed with R version 3.5.1 and plots were generated with the R package ggplot2 version 3.0.0.18,19
Results
Prevalence of Yoga in the EHR
The yoga data set was generated from a keyword search for “yoga” in the outpatient clinical chart notes. Through this search we identified 61,976 unique notes with yoga mentions corresponding to a cohort of 30,976 unique patients (median, 1 note per patient; interquartile range, 1 to 2 note(s) per patient). The yoga cohort accounted for 2.6% of the total patient population in the database during the study period. A control cohort was selected to match the yoga cohort on age, sex, and race by randomly sampling outpatient records that did not mention yoga in a 3:1 control to case ratio (n = 92,919).
The yoga cohort is enriched for younger, non-Hispanic, white females with commercial insurance relative to the broader patient population at Penn Medicine (Table 1).
Documentation of yoga in the EHR at Penn Medicine is trending upward. From November 2006 to November 2016, there was a 10.6-fold increase in the number of clinical chart notes mentioning the word yoga, an 8.9-fold increase in the number of patients with 1 or more notes mentioning yoga, a 5.4-fold increase in the number of clinicians using chart notes to document yoga, and a 1.2-fold increase in the number of clinical service departments affiliated with notes mentioning yoga. In all cases, these increases showed quadratic growth (Figure 1).
In total, 2398 unique clinicians documented yoga in the EHR during the study period. To ensure we have not selected for a subset of clinicians that have a penchant for yoga, we verified that 99.3% (2,381) of these clinicians are included as providers in the control cohort.
Together, the yoga and control cohorts come from a total of 55 unique clinical service departments. Forty-one of these departments (75%) had at least 1 note mentioning yoga on record during the study period. All clinical service departments represented in the yoga cohort are also represented in the control cohort.
The distribution of notes mentioning yoga, and the time to adoption of yoga as part of the clinical record, vary across clinical service departments (Supplemental Figure 1). The largest numbers of yoga notes were found in the primary care setting. In 2016, primary care alone accounted for 4904 yoga notes, or 38.2% of the total number of notes mentioning yoga that year. Linear models fit to count data illustrate that once the word yoga appears in a department, its use tends to grow (Supplemental Figure 1).
Characterization of Yoga in the EHR
To determine the context of the word yoga in the chart notes, we built a supervised text classification pipeline based on a set of 5600 manually annotated notes. In the annotation process, we identified 3 classes: clinician-documented yoga, clinician-recommended yoga, and other miscellaneous mentions of yoga.
The clinician-documented yoga class includes cases where the clinician is recording a patient's self-reported yoga practice, either as a lifestyle, for example, “exercise activity: yoga, 60 minutes/day, 3 day/week,” or as a therapeutic intervention, for example, “anxiety – managed with yoga only and no medications.” The clinician-recommended yoga class includes cases where the clinician is recommending yoga as a therapeutic intervention, for example, “I have recommended that she return to see her physiatrist, and also consider other forms of therapy including yoga.” And the other miscellaneous mentions of yoga class includes cases of ambiguous semantics, references to commercial products, like yoga pants or yoga toes (YogaToes, Dexter, MI), a product advertised to relieve foot pain, and automatically generated contraindications, for example, “avoid high-velocity sports (downhill skiing); avoid head stands and plow [pose] in yoga—use of high head rest and seatbelt/chest belt while in a car.”
The annotated notes were used to build a supervised multinomial Naïve Bayes classifier (balanced accuracy = 0.87, Supplemental Figure 2, see Methods). The tuned model used features generated from context windows of length 11, that is, the word “yoga” ± 5 words, and the final feature set contained 8508 elements. Discriminating features included the following: meditation, try, discussed, week, consider, does, stress, advised, doing, breathing, plow in yoga, head stands, tried yoga, exercise, therapy, stretching. The classifier was used to assign each yoga note to 1 of the 3 predefined classes. Based on this classification, clinician-documented yoga accounted for approximately 75% of yoga mentions, other miscellaneous mentions for approximately 14% of yoga mentions, and clinician-recommended yoga for approximately 10% of yoga mentions in the chart notes.
To determine if the distributions of yoga notes among the 3 classes changed over time, we looked at the distribution of yoga notes by class and year. As the total number of yoga notes increased each year, the numbers of notes in each class also increased (Figure 2A). But the fraction of the total number of yoga notes represented by each class remained relatively constant (Figure 2B). In other words, the number of yoga notes in each class increased in proportion to the total number of yoga notes written each year. On average, the distribution of yoga notes among the 3 classes did not change during the study period.
We focus the remaining analysis on the clinician-recommended yoga class.
EHR-Derived Use Cases of Yoga as Therapy
To systematically explore which medical conditions were associated with clinician recommendations to practice yoga, we used Fisher's exact test to compare the proportions of patients assigned each ICD code in the clinician-recommended yoga class to that of their demographically matched controls. Each code was tested independently, we did not combine similar or related ICD codes. Our threshold for significance was a predefined q-value ≤ 0.05.
Yoga mentions in the clinician-recommended yoga class were associated with 9 medical conditions (Figure 3). We consider these medical conditions to be clinician-derived therapeutic use cases of yoga. Parkinson's disease had the highest odds ratio (OR, 6.3; 95% CI, 3.7–11.4; q-value < 0.001). Other identified medical conditions meeting the significance criteria included: anxiety (OR, 5.8; 95% CI, 3.8–9.0; q-value < 0.001), depressive disorder (OR, 4.4; 95% CI, 2.2–9.1; q-value = 0.001), malaise and fatigue (OR, 3.9; 95% CI, 2.5–6.4; q-value = 0.001), pregnancy (OR, 3.9; 95% CI, 2.9–5.3; q-value < 0.001), backache (OR, 3.8; 95% CI, 2.4–6.3; q-value = 0.001), myalgia and myositis (OR, 3.2; 95% CI, 1.9–5.5; q-value = 0.001), hyperlipidemia (OR, 3.2; 95% CI, 1.7–6.3; q-value = 0.002), and lumbago (OR, 3.0; 95% CI, 2.1–4.3; q-value = 0.001).
Clinician recommendations to practice yoga came predominantly from primary care providers, with the exceptions of Parkinson's disease, pregnancy, and myalgia and myositis, which were most frequently found in notes attributed to providers from neurology, obstetrics and gynecology, and rheumatology departments, respectively.
Discussion
EHRs are an abundant data source that can uniquely fill the evidence gaps for some uses of complementary and integrative medicine in the prevention and treatment of chronic diseases. Here, we used EHR data from an academic medical center to characterize how yoga interfaces with health care.
We have shown that a large number of patients have medical chart notes that mention the word, yoga. Demographically, this subset of patients tracks national survey data showing that non-Hispanic, white, women are more likely to report using yoga in the past 12 months than men or individuals self-identifying as Hispanic, Black, or African American.2 An under-representation of racial and/or ethnic minorities and individuals of low socioeconomic status engaging with yoga has been documented in the literature and reported barriers to use of yoga include preconceived ideas of the practice, family and work obligations, and yoga class location.20 Increases in the number of patients with yoga notes are also supported by national trends showing use of yoga among US adults has been on the rise since 2002.1,2
In 2002, due to increased interest in and use of complementary and alternative medicine by patients and providers, the Federation of State Medical Boards published guidelines around its use.21 While there are no large-scale studies surveying health care provider perspectives on the role of yoga in medicine over time, our data show that each year an increasing number of clinicians, across clinical service departments, are at the minimum acknowledging yoga by documenting it in the chart notes of their patients.
There are reports suggesting that clinicians see potential in complementary and integrative medicine but do not feel they have enough information or experience to counsel patients on its proper use.22 In some cases, clinicians are cautiously biding time until better research emerges.23 This may explain why, despite the widespread and growing documentation of yoga in the EHR, clinician recommendations to use yoga remain proportionally unchanged.
Nevertheless, the EHR provides a unique data source to explore how yoga is currently being used in health care. From these data, we were able to identify 9 medical conditions that met the significance criteria for having an association with clinician recommendations to use yoga in practice.
Among these conditions, 2 are musculoskeletal: backache and lumbago. This finding is consistent with the American College of Physicians clinical practice guidelines that recommend yoga as a nonpharmacological, first-line therapy to treat chronic low back pain.24
We also identified 2 mental health conditions: anxiety and depressive disorder. There is an evidence-base supporting the use of yoga for these conditions.25⇓–27 One compelling aspect of this result is that these conditions are often comorbidities and the yoga literature is replete with studies of yoga as treatment for anxiety and depression in the context of other chronic diseases, including some of the conditions identified in this study.
The research on Parkinson's disease has explored the effects of yoga on the physical manifestations of the disease including motor control, postural stability, and functional mobility, and in the context of depression, fatigue and apathy, some of the most common comorbidities of the disease.28⇓⇓–31
In pregnancy, yoga research has focused on everything from safety for mothers and babies to mitigating low back pain, labor pain, anxiety, depression, stress, and sleep disturbances.24⇓⇓–27
We identified the endocrine and metabolic condition, hyperlipidemia. Meta-analyses of yoga for prevention of cardiovascular disease and stress-related biomarkers indicate there is suggestive but not conclusive evidence that yoga can effectively modulate lipid levels.35,36
Finally, we see evidence that the nonspecific symptoms of malaise and fatigue, and myalgia and myositis are being treated with yoga. This fits the narrative that patients prefer unconventional medicine, like diet-based or mind-body interventions, in cases of diagnostic uncertainty where there is ambiguous cause and effect.37 For conditions that manifest these symptoms but do not have an effective standard of care, like fibromyalgia, yoga can be a beneficial component of a multipronged approach to symptom management.38
The medical conditions identified in this study demonstrate wide-ranging use cases of yoga as therapy and support the notion that the integrated systemic effects of yoga lead to perceived improvements in both physical and mental health. As our health care system transitions to a value-based, patient-centered model, the broader implication of this study is that yoga, as a multipurpose nonpharmacological intervention, has a place in the health care landscape.39
Limitations and Future Work
The challenges inherent to secondary use of EHR data for clinical research have been explored and described elsewhere.40⇓⇓–43 In this work, we assume bias in patient reporting and clinician-documentation of yoga. There were no controls for the type/definition of yoga or for the frequency or duration of yoga practice. In addition, we relied on ICD codes as a proxy for diagnostic data as a means of maintaining consistency between the clinician-recommended yoga class and their demographically matched controls. It is possible that yoga was mentioned in a context not related to these codes. It is also possible that the odd ratios would change if we had combined ICD codes for similar or related conditions. ICD codes were not combined to maintain the integrity of the data as collected.
We acknowledge that with a balanced testing accuracy of 87%, there are some misclassifications in the clinician-recommended class adding noise to our analysis.
Future work will include in-depth longitudinal case studies in the EHR of patients with the identified medical conditions to assess changes in health outcomes attributable to yoga practice. As with other personalized medicines, we anticipate the effects of yoga to be patient dependent.
Conclusions
EHRs provide a unique opportunity to research yoga and other types of integrative medicine in a real-world setting. The prevalence of yoga mentions in the EHR is on the rise, and from these data we were able to use a kind of retrospective crowdsourcing based on the practical experience of clinicians to identify 9 therapeutic use cases of yoga. These results contribute to a growing body of evidence about the role integrative medicine plays in the treatment and management of chronic diseases. As our health care system evolves, and patients and clinicians turn to complementary therapies in greater numbers, we owe it to the community to continue to build this evidence-base by rigorously investigating the most effective use cases of these practices.
Acknowledgments
The authors thank Scott M. Damrauer, MD for useful discussions; and Trang Le, PhD for reviewing the manuscript.
Notes
This article was externally peer reviewed.
Funding: This work was supported by a PA-CURE grant from the Pennsylvania Department of Health.
Conflict of interest: none declared.
To see this article online, please go to: http://jabfm.org/content/32/6/790.full.
- Received for publication March 29, 2019.
- Revision received June 5, 2019.
- Accepted for publication June 12, 2019.