Effective teamwork among health professionals improves patient safety.1 , 2 A substantial proportion of preventable errors in United States hospitals are attributable to teamwork and communication failures.3 , 4 Recognizing this, the Institute of Medicine, the Joint Commission, the Agency for Healthcare Research and Quality (AHRQ), and others have made teamwork a top priority in their recommendations for improving healthcare.5 9

Teamwork is also prominently positioned within the American Board of Internal Medicine (ABIM) requirements for maintenance of certification for internists,10 as well as the Accreditation Council for Graduate Medical Education’s core competencies,11 milestones,12 and medical student competencies.13 As such, every physician at the undergraduate, graduate, and continuing professional level must demonstrate competency in teamwork.

While there is broad agreement on the imperative to improve teamwork, there is little consensus regarding how to measure it. Internal medicine teams vary substantially in composition, setting, function and charge. The knowledge, skills, and attitudes required for optimal teamwork within an inpatient medical team may differ from those necessary for successful interprofessional collaboration among undergraduate students in a classroom.13 , 14 Additionally, there are numerous purposes for teamwork assessment, including determining individual physician competence as well as measuring the effectiveness of teams as a whole.15

Given the heterogeneity of healthcare teams within internal medicine, it is logical that no single teamwork measurement tool will suit all clinical and educational situations. Yet, any endeavor to measure teamwork is likely to be most successful if it is grounded in the literature, built upon prior work, reliable and valid.15 Prior reviews have examined teamwork training and interventions, as well as the outcomes of effective teams.1 , 16 24 These reviews have advanced the understanding of ‘what works’ to improve teamwork (i.e. curricula and interventions), but they do not fully answer the critical question of how teamwork is best measured in healthcare.

Therefore, the objective of this systematic review is to provide a synthesis of published instruments that have been used to assess teamwork in internal medicine. Given the breadth and marked heterogeneity of literature on teamwork assessment within healthcare as a whole, this review was limited to a synthesis of teamwork tools used in internal medicine. It encompasses all instruments used in undergraduate, graduate, and continuing medical education in general internal medicine and internal medicine subspecialties. To capture all published validity evidence for each tool, we also included articles from non-internal medicine specialties that reported additional validity evidence. This paper is intended to serve as a resource to help educators, clinicians, and other health professionals identify appropriate teamwork measurement tools to apply to their own internal medicine settings and teams.

METHODS

Although there are no standard reporting guidelines specific to systematic reviews of assessment tools, this review is reported according to applicable sections of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) standards25 and similar reviews of assessment tools in medical education.26

Data Sources and Search Strategy

We searched MEDLINE, MEDLINE In-process, Cumulative Index to Nursing and Allied Health Literature (CINAHL), and PsycINFO for English language studies from January 1, 1979 through October 31, 2012. To identify studies related to teamwork, the term team was exploded to include all Medical Subject Headings (MeSH) and keywords containing “team” (e.g. team, teamwork, teamworking, team behavior, team climate, team culture, team collaboration, team effectiveness). Other MeSH terms and keywords related to teamwork included interprofessional relations, patient care team, cooperative behavior, crew resource, crisis resource and non-technical skills. These terms were combined with measurement terms including: scale, measure, inventory, questionnaire, tool, instrument, assessment, evaluation, profile, indicator, index and survey. Last, terms for teamwork and measurement were combined with terms pertaining to medical education and health professionals, including MeSH students, health occupations, health personnel, education professional, internship and residency, healthcare facilities, manpower and services, and exploded terms doctor, physician, nurse, student, intern, resident, registrar, house officer, medical, surgeon, operating, health, clinic, patient, interdiscliplinary, multidiscliplinary and interprofessional. An expert librarian with experience conducting literature searches for systematic reviews assisted in development and implementation of the search. The exact search strategies for each database are available from the authors.

The reference lists of all included articles were reviewed for additional studies. To identify in-press and unpublished studies, we searched scientific abstracts from national meetings between 2010 and 2012 of the Association of American Medical Colleges (AAMC), Association of Medical Educators of Europe (AMEE), Society of General Internal Medicine (SGIM), and the International Meeting on Simulation in Healthcare (IMSH). Authors of relevant abstracts were contacted for unpublished manuscripts. Finally, two experts who have published prior systematic reviews of teamwork or empiric studies of teamwork assessment reviewed the list of included articles to identify additional studies.

Study Selection

Articles were included if they were original research describing a quantitative tool designed for measuring teamwork within healthcare teams involving physicians and/or trainees in general internal medicine or an internal medicine subspecialty. To provide a comprehensive synthesis, we also included articles from non-internal medicine specialties that reported validity evidence for included tools. Studies of interprofessional teams (defined as two or more professions working together as a team)27 , 28 were included as long as internal medicine physicians (or medical students, residents, fellows) were one of the professions studied. Tools were considered measures of teamwork, based on authors’ descriptions of tools as measuring teamwork, collaboration, team process or function, team behavior, team effectiveness, team climate/environment, team culture, non-technical skills, or crew/crisis management. We excluded studies that measured just one specific aspect of team function, such as conflict, negotiation, leadership, communication, disruptive behavior and harassment. Studies of patient hand-over were excluded since recent reviews on this topic have been published.29 31

Title and Abstract Review

The search yielded 12,922 citations (Fig. 1). Each title/abstract was reviewed and we erred on the side of full article retrieval if titles/abstracts were insufficient to determine eligibility. A total of 892 articles was included for full article review. All uncertainties regarding inclusion were resolved by consensus.

Figure 1
figure 1

Article search and selection.

Data Extraction

Data were entered into a structured extraction form that included information on articles (study location, design, participants, setting) and tool characteristics (content, validity, and outcomes). Five authors extracted data. These authors met weekly during the study period and uncertainties were resolved by consensus. Thirty percent of articles were independently extracted by two authors to verify consistency in coding and determine interrater agreement using an intra-class correlation coefficient (ICC). The remaining 70 % of articles were extracted by a single reviewer.

We used an established framework to categorize the validity of instruments32 34 that has been used in similar evaluations of assessment tools.26 This framework includes five categories of validity evidence: 1) content (the degree to which the tool content reflects the construct being measured); 2) response process (training of raters to use the tool); 3) internal structure (instrument reliability including internal consistency, interrater, intrarater, and test-retest reliability); 4) relationships to other variables (relationship between scores and other variables measuring the same construct); and 5) consequences (outcomes associated with tool scores). Kirkpatrick’s hierarchy was used to categorize outcomes as satisfaction/opinion, knowledge and skills, behaviors, and patient outcomes.35 Patient measures were recorded as outcomes only if the study reported a direct quantitative association between the teamwork assessment score and the patient outcome.

To evaluate the methodological quality of studies, we used criteria from the ten-item Medical Education Research Study Quality Instrument (MERSQI),36 which encompasses basic methodological components (e.g. study design, sampling, analysis). Validity evidence for the MERSQI includes content, interrater, intrarater, and internal consistency reliability, and relationships to other variables, including correlations between instrument items and journal impact factor, 3-year citation rate, and journal editors’ quality ratings,36 as well as predictive validity based on associations with editors’ decisions to accept or reject manuscripts for publication.37 We tallied the number of studies that fully, partially or failed to satisfy each of the ten quality criteria in the MERSQI.

Data Synthesis

Characteristics of studies and teamwork measurement tools were synthesized qualitatively and reported in evidence tables. Articles describing identical tools were grouped to enable examination and presentation of all validity evidence and outcomes for each unique tool. Frequencies and percentages were used to describe study and tool characteristics. Means and standard deviations were used to summarize quality scores. Meta-analysis was not possible nor logical, given that this was a review of assessment tools with obvious heterogeneity among instruments, study designs, and outcomes.

RESULTS

Of the 12, 922 citations, 12,629 were identified through the electronic database searches, 16 from reference lists of included articles, two from expert review, and 275 from relevant meeting abstracts. We identified 98 articles from non-internal medicine specialties that contained validity evidence for included tools. The total number of articles meeting inclusion criteria was 178 (Fig. 1). Interrater agreement for data extraction was very good (ICC = 0.73, 95 % CI: 0.63–0.81).

Table 1 shows the characteristics of the 178 included studies. Approximately half of studies were conducted in the U.S. and one-third in Europe. Most (142, 80 %) of studies included practicing physicians as participants, followed by residents (68, 38 %) and medical students (11, 6 %). The majority (152, 85 %) of studies also assessed non-physician professionals (e.g. nurses, pharmacists, mid-level providers, therapists, social workers, administrators) in interprofessional teams. Although most studies took place in actual inpatient or outpatient practice settings, 37 (21 %) of studies were simulation-based and 13 (7 %) took place in classrooms.

Table 1 Characteristics of 178 Studies Describing 73 Tools for Measuring Teamwork

Study Quality

Figure 2 shows the proportion of studies satisfying the ten MERSQI quality criteria. Ten (6 %) studies fully satisfied, 59 (33 %) partially satisfied, and 109 (61 %) did not satisfy quality criteria for study design. The most frequent study design was single group cross-sectional (89 %). Ten studies were randomized controlled experiments.38 47 A majority (153, 86 %) of studies fully satisfied at least one validity criterion: 122 (69 %) studies reported content validity, 115 (65 %) reported internal structure, and 47 (26 %) described relationships to other variables. Twenty-nine (16 %) studies fully satisfied all three of these validity criteria. Most studies (140, 79 %) relied on subjective assessments by study participants for measuring teamwork.

Figure 2
figure 2

Methodological quality of 178 studies describing teamwork assessment tools.

Description and Validity Evidence for Teamwork Assessment Tools

The 178 included articles described 73 unique tools designed to measure teamwork (Table 2). Of the 73 tools, 15 (21 %) measured the teamwork of individuals working within teams, 43 (60 %) measured the teamwork of teams as a whole, and 15 (21 %) assessed both individuals and teams.

Table 2 Characteristics and Validity Evidence for 73 Teamwork Measurement Tools

Content validity was demonstrated for 54 (74 %) of tools (Table 2) and generally consisted of developing instrument content from expert panels, existing instruments, and literature review. The TeamSTEPPS Teamwork Attitudes Questionnaire80 is an example of an assessment tool with strong content validity designed to assess the teamwork attitudes, knowledge and skills of learners participating in the TeamSTEPPS curriculum. TeamSTEPPS is a training program developed by the United States Department of Defense and the AHRQ that encompasses leadership, situation monitoring, mutual support and communication.7 , 80 , 198 The TeamSTEPPS Teamwork Perception Questionnaire is a second instrument associated with this curriculum that measures individuals’ perceptions of organizational teamwork.216

Few tools (12, 16 %) reported response process, which included training raters to correctly use tools. The Multi-disciplinary Team Performance Assessment Tool160 , 161 is an observational teamwork assessment of cancer teams modified from an established teamwork assessment tool in the surgical literature (Observational Teamwork Assessment in Surgery).217 219 Assessors were trained in the use of this tool by an expert psychologist with experience using the tool.160

Reliability of tools was demonstrated by internal consistency (38, 52 %), interrater reliability (16, 22 %), intrarater reliability (1, 1 %), and test-retest reliability (2, 3 %). Reliability estimates for most tools were very good (> 0.7).220 The Physician/Pharmacist Collaboration Index is an example of a tool assessing interactions between internists and pharmacists that has extensive reliability evidence, including factor analysis, internal consistency (Crohnbach alpha 0.70-0.90) and interrater reliability (ICC 0.89).38 , 164 170 This tool measures the pharmacist’s view of collaboration among physicians and other health professionals in both inpatient and outpatient settings.

Relationships between teamwork scores and other variables reflecting the construct of teamwork were reported for 25 (34 %) tools (Table 2). Studies varied widely with regard to the specific variables reported. The Attitudes Toward Health Care Teams Scale (ATHCTS) has been used in ten studies measuring attitudes towards interprofessional collaboration in a variety of settings, most commonly interprofessional education.58 67 It consists of three subscales assessing attitudes about team value, team efficiency and the physician’s shared role on the team. ATHCTS scores have been shown to correlate with other measures of team process.65 The Ottawa Global Rating Scale has been used in multi-specialty education as an objective measurement of an individual’s crisis resource management skills in simulated scenarios.211 , 212 This tool has been shown to differentiate among residents’ level of training when applied in simulated medical crisis scenarios.211

Consequences validity refers to the outcomes associated with scores from teamwork tools. For many tools (35, 48 %), outcomes included satisfaction or opinion of participants (Table 2). Twelve (16 %) tools measured participants’ teamwork skills. Teamwork skills such as leadership, communication and crisis management were assessed through simulation;41 , 138 , 209 however, other tools involved direct observation of skills in actual practice settings, such as medical residents’ abilities to lead ward teams213 and palliative care physicians’ communication in team meetings.195 Behaviors of students, residents/fellows, or practicing physicians were reported outcomes for ten (14 %) tools.

Teamwork Tools Associated with Patient Outcomes

Relationships between teamwork scores and patient outcomes have been directly examined for 13 (18 %) of teamwork tools (Table 3). Teamwork tools by Baggs83 , 210 and Wheelan155 show inverse relationships between positive teamwork and mortality rates.

Table 3 Relationships Between Scores from Teamwork Measurement Tools (n = 13) and Patient Outcomes

Of the tools shown to correlate with patient outcomes, the Safety Attitudes Questionnaire (SAQ)113 has the strongest validity evidence, and has been adapted for use across multiple settings and learner levels. The SAQ contains six domains, one of which is teamwork. Twenty-seven studies have reported validity evidence for the SAQ.45 , 46 , 113 137 SAQ scores have been correlated with reduced postoperative complications;116 , 117 however, studies have not shown associations between the SAQ and mortality or patient safety events.116 , 118

The Team Climate Inventory (TCI) has been used to assess teamwork among inpatient and outpatient interprofessional teams in 21 studies.172 192 The TCI has four subscales: vision, participative safety, task orientation, and support for innovation.190 A study by Bower et al. found that ratings on the Team Climate Inventory (TCI) were associated with better diabetes care,172 while another study showed no relationship between the TCI and diabetes management.173

The Intensity of Interprofessional Collaboration Questionnaire is an instrument that measures the nurse–physician collaboration in the inpatient setting. Patients cared for by teams with high intensity collaboration on this scale reported higher satisfaction, lower uncertainty, and better pain management.96 However, there was no relationship between collaboration and patient length of stay.96

DISCUSSION

Assessing teamwork is imperative for determining physician competency11 , 13 , 221 and ensuring patient safety.3 , 5 Valid and reliable measurement of teamwork is necessary to understand connections between teamwork and patient safety, and to maximize gains achieved through teamwork education.

Together, the 178 studies and 73 teamwork tools summarized in this review constitute a resource for internists who wish to apply teamwork assessment tools to their local settings and teams. Although there is considerable validity evidence for many of these teamwork tools, most assessments consisted of participants’ subjective reports of satisfaction, attitude, or opinion. A thorough understanding of attitudes is prerequisite to improving teamwork; yet, tools examining teamwork behaviors in actual practice provide scores that may be more readily linked to important patient safety outcomes.83 , 155 , 202 Unfortunately, these assessments often require extensive rater training to achieve adequate reliability,105 which can be time consuming and costly. Implementing existing tools, rather than creating new ones, should reduce the cost of tool development so that these resources can be allocated to rater training and implementation. Furthermore, the trustworthiness of validity information depends upon the methodological quality of studies from which it is derived. Based on MERSQI criteria, further studies should aim to improve rigor of study design and outcome assessment.

Evidence suggests that teamwork training should improve patient safety,1 , 2 yet our review indicates that most studies examining teamwork in internal medicine do not directly link teamwork measures to reported patient outcomes. Several studies in this review described concurrent changes in patient outcomes and teamwork scores (e.g. pre/post teamwork training), but did not actually examine relationships between outcomes and teamwork scores, thus making it difficult to attribute gains in patient safety to teamwork improvements. To advance the understanding of how to improve safety through collaboration, future studies should not only apply valid teamwork assessments, but should directly examine relationships between these assessments and patient outcomes. Robust teamwork assessments and appropriate conceptual frameworks are essential to meaningful evaluations of relationships between teamwork and patient outcomes.

The majority of teamwork tools in this review were applied to groups of individuals working together to achieve a common goal within traditional team structures (e.g. physically side by side/face to face).23 , 222 However, the concept of ‘team’ in healthcare is rapidly evolving to include a greater emphasis on interprofessional collaboration,223 as well as new team structures. With the advent of restricted duty hours,224 and frequent hand-offs,29 , 30 team members are often working in shifts225 227 and are also becoming dispersed geographically. The telemedicine intensive care unit is an example in which intensivists and nurses use telemetry and electronic medical records to provide care to patients hospitalized remotely.228 Teams dispersed over distance and/or time face unique teamwork challenges119 that may require new or adapted assessment tools.

There are limitations to this review. First, although our search was comprehensive, we may have failed to capture some non-indexed or unpublished studies. We attempted to limit this possibility by reviewing abstracts from four professional meetings that are likely to include teamwork content, reviewing reference lists of included articles, and by having two content experts examine our reference list. Also, our electronic search included terms such as “registrar” that should have helped capture studies across countries. Second, to make the scope of the review manageable, it was limited to tools published in the field of internal medicine. However, some validity evidence was obtained from studies conducted in other specialties such as surgery and anesthesia. Validity is not a property of an instrument itself; rather it is a property of inferences derived from implementation of the instrument within specific contexts.34 As such, the setting in which tools are applied influences the validity information acquired. When selecting a tool for use in a new setting, it is important to consider the degree to which existing validity evidence may apply to the new context.

Third, this review included only quantitative measurement tools; however, qualitative studies provide valuable frameworks for understanding team behaviors and processes28 , 229 , 230 that are essential to the development of meaningful assessment tools. A synthesis of findings from the qualitative literature on teamwork would be a useful next step. Fourth, although we used an extremely broad definition and search strategy for teamwork, we excluded studies that examined just one specific element of interpersonal interaction, such as disruptive behavior and harassment. These behaviors alone do not constitute teamwork; however, they certainly may influence team interactions.231 233 Finally, we used established frameworks for abstracting tool validity32 , 33 and study quality;36 however, these frameworks do not encompass every aspect of validity and/or quality present in studies.

In conclusion, this systematic review provides a synthesis of teamwork assessment tools in internal medicine that may serve as a resource for educators who wish to assess teamwork for various learner levels and settings. Valid teamwork assessment is essential to determine physician competency and to ensure patient safety. Future research should expand the validity evidence for existing tools and further explore relationships between teamwork assessment and important patient safety outcomes.