Diagnostic accuracy of skin-prick testing for allergic rhinitis: a systematic review and meta-analysis

Background Allergic rhinitis is the most common form of allergy worldwide. The accuracy of skin testing for allergic rhinitis is still debated. Our primary objective was to evaluate the diagnostic accuracy of skin-prick testing for allergic rhinitis using the nasal provocation as the reference standard. We also evaluated the diagnostic accuracy of intradermal testing as a secondary objective. Methods We searched EBM Reviews from 2005 to March 2015; Embase from 1980 to March 2015; and Ovid MEDLINE(R) from 1946 to until March 2015. We included any study with at least 10 subjects including children. We excluded non-English studies. We performed data extraction and quality assessment using the QUADAS-2 tool. Results We meta-analysed seven studies assessing the accuracy of skin-prick testing using the bivariate random-effects model, including a total of 430 patients. The pooled estimate for sensitivity and specificity for skin-prick testing was 85 and 77 % respectively. We did not pool results for intradermal testing due to few number of studies (n = 4), each with very small sample size. Of these, two evaluated the accuracy of intradermal testing in confirming skin-prick testing results, with sensitivity ranging from 27 to 50 % and specificity ranging from 60 to 100 %. The other two evaluated the accuracy of intradermal testing as a stand-alone test for diagnosing allergic rhinitis with sensitivity ranging from 60 to 79 % and specificity ranging from 68 to 69 %. Conclusions Findings from this review suggest that skin-prick testing is accurate in discriminating subjects with or without allergic rhinitis.


Background
Allergic rhinitis is a collection of symptoms that develop when the immune system becomes sensitized and overreacts to air-borne allergens [1]. It is the most common allergic disorder worldwide, [2] and one among the leading chronic conditions affecting both children and adults [3]. The global prevalence of allergic rhinitis is between 10 and 30 % for adults and as high as 40 % for children [4,5]. Symptoms of allergic rhinitis usually develop before age 20 years, [6] and peak at age 20-40 years, before gradually declining [7].
The diagnosis of allergic rhinitis is often made on the basis of clinical characteristics and response to pharmacotherapy [7]. Evidence of sensitization to a known allergen usually involves a combination of skin or blood testing and patient's exposure history [8]. Because of ease of administration and being less invasive, skin-prick testing is recommended for diagnosis of allergic rhinitis, followed by intradermal testing to confirm negative skin-prick testing results [9]. There is no universally accepted "gold standard" for detecting allergic rhinitis, although in research studies, nasal provocation is often used as the reference standard. There seems to be no consensus among researchers on the diagnostic accuracy of skin testing for allergies [10][11][12], including allergic rhinitis [13][14][15]. The variability in the accuracy of these tests across studies can be explained by lack of standardization, stability and composition of allergens, the testing device, the patient population, or the quality of study design. However, we are not aware of any systematic review that has evaluated the diagnostic accuracy of skin testing for allergic rhinitis across a range of studies. To address this issue, we conducted a systematic review and meta-analysis of published studies on the diagnostic accuracy of skin-prick testing in children or adults with suspected symptoms of allergic rhinitis. As a secondary analysis we also evaluated the diagnostic accuracy of intradermal testing for the same group of patients.

Methods
We conducted and reported this review according to published guidelines using a pre-specified protocol [16].

Eligibility criteria
We included any study that reported both sensitivity and specificity of skin-prick testing in at least 10 subjects including adults, children or both with allergic rhinitis using nasal provocation as the reference standard. We included full text papers and abstracts published in English language. We excluded studies enrolling subjects with known allergic status (commonly referred to as "case-control" designs in the diagnostic accuracy literature), and studies that did not include nasal provocation as the reference standard.

Search strategy
We performed a literature search with the help of medical librarians on April 24, 2015, using All Ovid MED-LINE (from 1946 to present), Embase (from 1980 to present), Cochrane Database of Systematic Reviews (from 2005-present), Database of Abstracts of Reviews of Effects (from 1991-present), CRD Health Technology Assessment Database (from 2001-present), Cochrane Central Register of Controlled Trials (1991-present), and NHS Economic Evaluation Database (from 1995-present). The search strategy included a combination of key words and MeSH terms and was adapted for each database to account for differences in indexing. We limited our search to English language. We also searched gray literature sources and conference abstracts. Appendix 1 provides details on the search strategies used. We also examined reference lists for any additional relevant studies not identified through the search.

Study selection, data abstraction and analysis
We screened titles and abstract (CK, IN) and obtained full texts for studies that met the eligibility criteria. We extracted estimates for sensitivity, specificity, and sample size from all eligible studies. We also computed sensitivity and/or specificity for studies that did not report these estimates but provided sufficient information for their derivation. We constructed forest plots to assess heterogeneity in test accuracy across studies. In case of substantial heterogeneity, we proceeded with a subgroup analysis to determine the reason for inconsistency. When homogeneity assumption was deemed appropriate, we pooled studies using the bivariate approach [17]. The pooled results were presented on a summary receiver operating characteristic curve (sROC), which included a 95 % confidence ellipse. When homogeneity assumption failed to hold, we presented sensitivity and specificity separately for each study. The logit transformation was used for the calculation of study specific confidence intervals to account for asymmetry in the distribution of sensitivity and specificity. When estimates were on, or too close to the boundary of the parameter space (i.e., values for sensitivity or specificity were equal to, or approximately equal to 0 or 100 %), a continuity correction factor of 1 % was applied. All analyses were performed using the MADA package in R version 3.0.2.

Quality of evidence
The quality of evidence for each bivariate outcome within studies was examined according to the quality assessment of diagnostic accuracy studies (QUADAS-2) [18]. This tool consists of four key domains: patient selection, index test, reference standard, and flow and timing.

Study selection
One reviewer (CK) screened and evaluated 2360 citations and assessed 56 full text articles for eligibility. An unbiased sample of 374 citations were screened by a second reviewer (IN) using the method of Nevis et al. [19]. The chance-corrected agreement for titles and abstracts was good (estimated kappa = 75 %; 95 % CI 50-100 %). We resolved disagreements by consensus. Of the 56 full text articles, we excluded 42 as they were not relevant, three articles had insufficient information on outcomes and three were case control studies. Figure 1 summarizes the selection process. Eight articles were eligible to be included in the systematic review [15,[20][21][22][23][24][25]. Only seven of the eight articles were included in the meta-analysis because one study restricted their allergen to alternaria that was not evaluated by any of the other eligible studies in this review, and whose findings deviated substantially from the remaining studies [14].

Primary analysis: diagnostic accuracy of skin-prick testing
We conducted a meta-analysis of studies reporting sensitivity and specificity of skin-prick testing. The pooled estimate of sensitivity and specificity for this test was 88.4 and 77.1 % respectively (Fig. 2). We also conducted a sensitivity analysis by including in the meta-analysis, the study that tested for alternaria [14]. Inclusion of this study did not significantly alter the estimates for accuracy. The pooled estimate for sensitivity and specificity changed to 85.0 and 77.3 % respectively (Fig. 3). The forest plots for heterogeneity are presented in Figs. 4 and 5.

Risk of bias and applicability concerns
We summarize assessment of risk of bias in Figs. 6, 7 and 8. For skin-prick testing the risk of bias was "unclear" in five studies [15,[22][23][24][25]. For intradermal testing the risk of bias was "high" in one study, [14] and "unknown" in two studies [15,25]. Applicability concerns were "high" in two studies [14,20].
We used Fig. 8 to evaluate the potential for heterogeneity in estimates for the accuracy of skin-prick testing. The inclusion of Krouse et al. [14] introduced a discernible heterogeneity across studies. Specifically, the 95 % confidence (CI) for sensitivity barely overlapped with CIs of other studies, and its inclusion swayed the correlation between sensitivity and specificity toward a positive value-violating a requirement for meta-analysis of diagnostic accuracy studies that the correlation should be non-positive for homogeneity assumption to hold. When this study was removed from the analysis, the negative correlation was observed (Fig. 5).
Five studies either did not report [15] or use [15,[22][23][24]] a 3 mm cut-off value for the wheal size diameter recommended by the American Academy of Allergy, Asthma and Immunology (AAAAI) and the American College of Allergy, Asthma, and Immunology (ACAAI) [9]. Given the relation between the cut-off value and sensitivity and specificity, and because a 3 mm cut-off value might not be optimal in all settings, [26] we classified these studies as "unclear-risk of bias". Moreover, the sample size for two studies evaluating the accuracy of intradermal testing [14,20] was too small, calling into question whether findings from these studies apply to the majority of suspected allergic rhinitis patients presenting in clinics. We classified both studies as "high-risk of bias".

Discussion
Findings from this review suggest that skin-prick testing is reasonably accurate in identifying patients with suspected symptoms of allergic rhinitis. The level of accuracy reported in studies eligible for meta-analysis ranged from sensitivity of 68 to 100 % and specificity of 70 to 91 %. Although we could not establish the source of heterogeneity in testing accuracy across studies, several factors that influence accuracy of skin-prick testing have been reported in the literature [9,27]. These include skill of the tester, the testing device, colour of the skin, skin reactivity on the day of testing, potency, and stability of test reagents.
To our knowledge this is the first systematic review and meta-analysis to evaluate the accuracy of skin-prick testing. Given lack of consensus among researchers and health practitioners on the performance of this test, findings from this review broaden our knowledge on the accuracy of this test across a large body of evidence. This is especially important given that effectiveness of intervention such as immunotherapy, avoidance, or pharmacotherapy largely depends on the correct diagnosis. Thus proper diagnosis can alleviate financial burden and loss in quality of life for millions of patients affected by allergic rhinitis.  Page 8 of 12 Nevis et al. Allergy Asthma Clin Immunol (2016) 12:20 Although there are no restrictions on age limits for skin-prick testing, literature suggests that skin reaction diminishes for young children [28]. That is, a 3 mm threshold for wheal size diameter is likely to yield a high rate of false positives in this group of patients. However, we were unable to assess the accuracy of skin-prick testing in children younger than 9 years due to the fact minimum age for eligible studies for this review was 9 years.
It should be noted that a 3 mm cutoff criteria recommended in guidelines is mainly based on reproducibility in relation to nasal provocation rather than clinical relevance [29]. That is, larger wheal sizes may predict a positive response to nasal provocation but not necessarily severity of clinical symptoms. The extent of agreement between wheal size and clinical symptoms may depend on population characteristics and allergen extracts.
We note the following limitations. First, we were unable to determine the degree of accuracy of intradermal testing because of the limitations in the four included studies. Hence, well designed methodologically rigorous studies are required to firmly establish the accuracy of intradermal testing. Second, we used nasal provocation as the reference standard. However, this test may not always represent the natural exposure to allergens. Despite this limitation, nasal provocation is still considered as the best "gold standard" available by several guidelines. Finally, there was a substantial variation in allergen extracts among studies. Nonetheless, skin-prick testing results remained fairly accurate regardless of the type of extracts.

Conclusions
In conclusion this review supports findings from several studies that skin-prick testing is accurate for diagnosing patients with allergic rhinitis. Several factors have been reported to influence the accuracy of prick testing, including skill of the tester, the testing device, color of the skin, skin reactivity on the day of testing, potency, and stability of test reagents. We were unable to determine the degree of accuracy of intradermal testing because of the limitations in the four included studies. Welldesigned methodologically rigorous studies are required to firmly establish the accuracy of allergy skin testing and especially intradermal testing.

Abbreviations
CRD Database: Centre for Reviews and Dissemination Database; NHS: National Health Service; MeSH: medical subject headings; sROC: summary receiver operating characteristic curve; QUADAS-2: quality assessment of diagnostic accuracy studies; 95 % CI: 95 % confidence interval; ACAAI: American College of Allergy, Asthma, and Immunology; AAAAI: American Academy of Allergy, Asthma and Immunology.