Skip to main content

Epigenome-wide association study of asthma and wheeze characterizes loci within HK1



To identify novel epigenetic markers of adolescent asthma and replicate findings in an independent cohort, then explore whether such markers are detectable at birth, predictive of early-life wheeze, and associated with gene expression in cord blood.


We performed epigenome-wide screening with recursive random forest feature selection and internal validation in the IOW birth cohort. We then tested whether we could replicate these findings in the independent cohort ALSPAC and followed-up our top finding with children of the IOW cohort.


We identified 10 CpG sites associated with adolescent asthma at a 5% false discovery rate (IOW, n = 370), five of which exhibited evidence of associations in the replication study (ALSPAC, n = 720). One site, cg16658191, within HK1 displayed particularly strong associations after cellular heterogeneity adjustments in both cohorts (ORIOW = 0.17, 95% CI 0.04–0.57) (ORALSPAC = 0.57, 95% CI 0.38–0.87). Additionally, higher expression of HK1 (OR = 3.81, 95% CI 1.41–11.77) in cord blood was predictive of wheezing in infancy (n = 82).


We identified novel associations between asthma and wheeze with methylation at cg16658191 and the expression of HK1, which may serve as markers of, predictors of, and potentially etiologic factors involved in asthma and early life wheeze.


Asthma is a common chronic respiratory disease particularly among children [1], causing substantial health care costs [2]. Asthma has complex pathophysiology [3], phenotypic variability [4], and is polygenic with a high heritable component, yet GWAS discoveries can only explain a small fraction of asthma variance [5]. Epigenetic mechanisms, which regulate gene expression potential [6], have received attention in studies of asthma because these mechanisms may be influenced via environmental exposures, particularly exposures that occur in utero [7, 8]. One of the most studied epigenetic mechanism is DNA methylation (DNAm), which is the covalent addition of a methyl group to the DNA at a cytosine residue that is followed by a guanine (CpG site); acting as an important regulator of gene transcription [6].

Recent epigenetic epidemiology studies have associated DNAm in blood with current asthma and/or wheeze in childhood and adolescence [9, 10], in cord blood with childhood asthma [11], and as a possible mediator between maternal and offspring asthma [12]. DNA methylation has also been implicated as a possible mediator of the relationships between both environmental and genetic factors with asthma. For instance, environmental exposure to air pollution is a risk factor for asthma exacerbations as well as asthma onset [13]. Recent studies have shown that air pollution is associated with differential DNAm of TET1 [14] and FOXP3 [15], and that differential methylation of these genes associates with asthma, suggesting that epigenetic regulation has a potential mediating role. Additionally, GSDMB and ORMDL3 are two well-recognized asthma susceptibility genes [16], and recent studies have shown that DNAm may be a mediator between genetic variation and the expression of these genes [17]. These studies provide supportive evidence that epigenetic mechanisms may be involved in the etiology of asthma, potentially as intermediates between recognized risk factors and the development of symptoms. A recent epigenome-wide meta-analysis of multiple European cohorts identified robust associations between asthma and blood DNA methylation throughout childhood (4–8 years of age), which retained strong associations with asthma status among isolated eosinophils and these epigenetic signatures were indicative of eosinophil and cytotoxic cell activation [18]. The above studies highlight the mounting evidence that differential epigenetic regulation of specific genes contributes to asthma etiology, and that epigenome-wide approaches have led to the identification of novel asthma-associated epigenomic loci. Performing additional EWAS in independent populations with different ages and different asthmatic phenotypes can improve our understanding of which loci are informative across multiple populations, how these epigenetic variations relate to asthma throughout the life course, and whether their methylation levels correlate with specific phenotypic characteristics, such as inflammation or lung function.

In the current study, we performed an EWAS using both a standard CpG-by-CpG approach as well as an innovative feature selection method to identify novel epigenetic markers of prevalent asthma in 18 years olds, investigated if the identified loci were predictive of early-life wheeze, and whether DNA methylation at these loci were related to gene expression. We first conducted an exploratory epigenome-wide screening study of DNAm in whole blood within the Isle of Wight (IOW) birth cohort, followed by a replication study within the Avon Longitudinal Study of Parents and Children (ALSPAC). Then, with data from the offspring of the IOW birth cohort participants, we tested whether the same associations exist between cord blood DNAm and wheeze without upper respiratory viral infection (cold) within the 1st year of life, followed by testing for associations between gene expression with DNAm and with infant wheeze.


The Isle of Wight birth cohort

The Isle of Wight (IOW) birth cohort is an unselected birth cohort of children born between January 1, 1989 and February 28, 1990 in Isle of Wight, UK. Details about the birth cohort have been described in detail elsewhere [19]. After exclusion of adoptions and prenatal deaths, 1456 children were enrolled and followed-up through to 18 years of age (n = 1313; 90.2% retention). At each follow-up, participants were evaluated for manifestations of allergic disease and administered detailed questionnaires, including study specific questions, as well as questions derived from the International Study of Asthma and Allergies in Childhood (ISAAC), the most extensive international study of asthma, which lead to the development and validation of questions about asthma and wheeze symptoms [20]. Ethical approval was obtained from National Research Ethics Service, NRES Committee South Central—Southampton B for the 18-year follow-up (06/Q1701/34) and NRES Committee South Central—Hampshire B (09/H0504/129) for the follow-up of IOW participants’ offspring; written informed consent was provided by the infants’ parents.

At the 18-year follow-up a subset of participants (n = 370) were selected to take part in an epigenetic screening; this sample is referred to as the ‘IOW F1 sample’ herein. The primary outcome for this study was current asthma defined as having an asthma diagnosis and self-reported wheeze and/or use of asthma medications in the previous 12 months. Those attending the 18 years follow-up in person also performed spirometry and fractional-exhaled nitric oxide (FeNO), which were in accordance with American Thoracic Society (ATS) guidelines [21, 22], as well as allergen sensitization via skin prick tests (SPTs). Lung function assessments were performed using a Koko Spirometer and software with a desktop portable device (PDS Instrumentation, Louisville, USA). FeNO measurements (Niox mino, Aerocrine AB, Solna, Sweden) were obtained prior to spirometric assessments. Atopy was defined as having at least one positive SPT among 11 allergens (cows’ milk, hens’ egg, peanut, cod, house dust mite, cat, dog, Alternaria alternata, Cladosporium herbarium, grass pollen mix, and tree pollen mix). DNA was extracted from peripheral blood collected at the 18-year follow-up using a salting out procedure.

The IOW offspring (IOW F2 sample) are being enrolled in the IOW 3rd Generation study through ongoing recruitment since 2010. To date, 390 newborns have been enrolled; cord blood samples were collected at birth and have been processed on 111 newborns for DNAm and 82 newborns for gene expression. Questionnaires about allergy and wheeze symptoms were administered to the parents at follow-up visits 3, 6, and 12 months after birth. The primary dependent variable for this sample was parent reported wheeze occurring when the infant had no symptoms of a cold. We also investigated any reported wheeze as an alternate outcome.

The ALSPAC cohort

The Avon Longitudinal Study of Parents and Children (ALSPAC) is a large, prospective cohort study based in the South West of England. In total, 14,541 pregnant women resident in Avon, UK with expected delivery dates between 1st April 1991 and 31st December 1992 were initially enrolled; 13,988 children were alive at 1 year [23, 24]. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees; written informed consent was provided by all participants. Self-completed questionnaires were administered during pregnancy and then at regular intervals. Current asthma status was obtained around the age of 17 years, defined as a reported doctor’s diagnosis of asthma in addition to reported wheezing, asthma or the use of asthma medication in the previous 12 months. 5036 adolescents had complete phenotype data of which, 720 also had DNA methylation data from whole blood collected at an average age of 17 years. Genome-wide methylation measurements were conducted at the University of Bristol as part of Accessible Resource for Integrated Epigenomic Studies (ARIES) project ( [25]. For the purposes of this study, multiple births and children of non-white ethnicity were excluded due to small numbers.

DNA methylation arrays

In the IOW F1 sample (18 years of age), IOW F2 sample (cord blood), and ALSPAC sample (17 years of age) DNAm was assessed genome-wide using the Illumina Infinium® HumanMethylation450k BeadChip (Illumina, Inc., CA, USA). The details of data processing steps are provided in Additional file 1: Method S1. Briefly, quality control and preprocessing methods for both cohorts included background correction, probe-type standardization, batch effect adjustments, and exclusion of potentially problematic probes. Methylation levels were calculated as beta (β) values, which can be interpreted as percent methylation. Because β-values can suffer from severe heteroscedasticity, M-values were calculated via log2(β/(1 − β)) which better approximate a normal distribution. Cellular heterogeneity of blood samples was assessed by estimating the proportions of CD8+ T-cells, CD4+ T-cells, natural killer cells, B-cells, monocytes, eosinophils and other granulocytes [26, 27] via the estimateCellCounts function in R. These proportions were included in our regression models as potential confounders of the relationship between DNA methylation and current asthma.

Gene expression array

At birth, IOW F2 cord blood samples were collected into PAXgene Bone Marrow RNA Tubes and RNA extracted using PAXgene RNA kits (PreAnalytiX GmbH, Switzerland). RNA integrity was verified with the Agilent 2100 Bioanalyzer system. Genome-wide mRNA expression was assessed via one color (Cy3) experiments with the Agilent (Agilent Technologies, Santa Clara, CA) SurePrint G3 Human Gene Expression 8 × 60 k v2 microarray kits. Array content was sourced from RefSeq, Ensembl, UniGene, and GenBank databases and provides full coverage of the human transcriptome in 50,599 biological features (including replicate probes and control probes). The oligos were 60mer in length and each transcript was tagged at least once and some had multiple tagging oligos for genes with documented splice variants. Data QC indices and analyses were performed with Agilent GeneSpring software. These data were then percent shift normalized and log2-transformed.

Statistical analyses (discovery—IOW F1)

We randomly divided the IOW F1 (18 years of age) sample into two independent sub-samples. The stage-1 data (nS1 = 91) were used for random forest (RF) feature selection because RFs rely on few statistical assumptions, are efficient with high-dimensional data, are robust to outliers and noise, and produce measures of variable importance [28, 29]. This feature selection technique was utilized in a recent epigenetic study of atopy that yielded many replicable loci [30]. The RF algorithm has a tendency to produce a predictor that is overfit to the supplied data; however, in our study RF was applied to select features based on variable importance rather than prediction. In addition, the RF algorithm was only applied to a subset of the IOW data to further diminish the possibility of overfitting, allowing us to examine the associations between DNAm and asthma in a statistically independent dataset. The stage-2 sample (nS2 = 279) was larger to retain greater power, which was necessary for hypothesis testing and multiple testing adjustment.

Recursive RF feature selection was implemented on the stage-1 sample (ns1 = 91) to select the CpGs most informative for asthma. We utilized balanced sampling, tested 10% of predictors per node (mtry = 0.10) and grew forests with 7500 trees (ntree = 7500). We implemented the RF recursively: (1) ran the RF algorithm on all available predictors (248,336 CpGs) via the randomForest package in R, (2) extracted out-of-bag (OOB) misclassification rates and variable importance measures (VIMs), (3) sorted the predictors by their VIMs, (4) excluded half of the predictors with the smallest VIMs, and (5) repeated the sequence until the asthma-specific misclassification levelled off. Predictors from the final iteration were selected for stage-2 analyses.

M-values for the selected CpGs were tested for their associations with asthma status with logistic regression after trimming potential strong outliers identified with adjusted boxplots. We generated false discovery rate (FDR) adjusted p-values [31] via the q-value package in R; CpGs within a 5% FDR (q-values < 0.05) were considered ‘discovered’ and were candidates for the replication study.

Finally, we also performed a traditional EWAS regressing the beta-values for each individual loci on asthma status in unadjusted linear models, and models adjusted for sex, CD4+ T-cells, CD8+ T-cells, monocytes, eosinophils, natural killer, and granulocytes. Models that produced association within a 5% FDR (q-values < 0.05) were considered statistically significant.

Statistical analyses (independent replication—ALSPAC)

Candidate CpGs were tested for their associations with asthma in the independent cohort, ALSPAC (N = 720). To assess consistency of associations, results from the ALSPAC cohort were compared to results from the full IOW F1 sample (N = 370) using two logistic regression models for each CpG site: a crude model between M-values and asthma, and a second model adjusting for sex and estimated cell-type proportions of CD8+ T-cells, CD4+ T-cells, natural killer cells, B-cells, monocytes, eosinophils and other granulocytes, which were estimated from the methylation array data [26, 27]. ALSPAC also included batch variables (Additional file 1: Method S2), to adjust for technical variations across the DNAm arrays. Statistical significance was determined at α of 0.05.

Statistical analyses (functional validation—IOW F2)

Among the successfully replicated loci, wheeze without cold and any wheeze were modeled with logistic regression. This included all newborns for which at least one infant follow-up visit had been completed (n = 111 for DNAm models and n = 82 for expression models). Cord blood proportions of CD8+ T-cells, CD4+ T-cells, natural killer cells, B-cells, monocytes, granulocytes, and nucleated red blood cells (nRBCs) were estimated via the estimateCellCounts function [26] using a cord blood reference panel [32]. We adjusted for season of birth, infant sex, and cell-type proportions. Statistical significance was determined at α of 0.05.


Sample characteristics and study flow chart

A flowchart of all analyses is provided in Fig. 1. The subjects in the IOW F1 discovery sample were all 18 years old, predominantly female (66.2%) and 13.9% (n = 51) of participants were asthmatic. Asthmatics were more likely to be atopic (66.0% vs 29.5%), have lower FEV1/FVC Ratio (means: 0.83 vs. 0.88), greater FeNO (medians: 21.0 vs 14.0) and have higher proportions of B-cells (0.046 vs 0.039) and eosinophils (0.045 vs 0.021) (Table 1). The average age of subjects in the ALSPAC sample was 17 years old; 16.7% of the ALSPAC sample had asthma and 56.3% of participants were female. The IOW stage-1 and stage-2 samples, utilized for feature selection and internal validation respectively, had similar distributions of all covariates (Additional file 2: Table S1).

Fig. 1
figure 1

Flowchart of analyses and results for each stage of the study. IOW Isle of Wight, F1 first generation sample, F2 second generation sample, ALSPAC Avon Longitudinal Study of Parents and Children, DNAm DNA methylation

Table 1 Baseline characteristics among those with and without asthma in the IOW F1 sample (18 years of age)

Discovery phase (stage-1 feature selection)

Recursive RF feature selection was implemented on the stage-1 sub-sample (ns1 = 91), with a starting set of 248,336 CpG sites. The asthma-specific misclassification rates levelled off at the 12th iteration of the recursive RF algorithm, meaning that further reductions in the number of features would result in loss of information about asthma-associated loci. Thus, the 121 features (CpG sites) included in the 12th iteration (Additional file 3: Figure S1) were selected for stage-2 analysis.

Discovery phase (stage-2 logistic regression)

The stage-2 analysis was performed in an independent sub-sample (ns2 = 279), to test the associations between DNA methylation and asthma at the 121 selected CpG sites with logistic regression. Of the 121 CpGs, 10 were associated with asthma at a 5% FDR (q-values < 0.05) (Additional file 2: Table S2). For all 10 sites, lower methylation was associated with greater odds of asthma. Adjustment for cellular heterogeneity substantially attenuated many of the parameter estimates and none of the adjusted models retained 5% FDR-significant q-values. However, the parameter estimates for the top five hits were mostly unperturbed and retained at least nominally significant p-values (< 0.05); in the case of cg16658191 and cg25578728, the magnitude of the associations became stronger after cell-mixture adjustment.

Replication analysis in ALSPAC

We then aimed to see whether the associations observed in IOW could be replicated in an independent cohort, ALSPAC. To compare associations between the IOW and ALSPAC cohorts, we produced odds ratios (ORs) and 95% confidence intervals (CIs) using the pooled IOW samples from stage-1 and stage-2 (IOW F1 n = 370) (Table 2) and ALSPAC (n = 720) (Table 3) for the 10 FDR-significant CpG sites. All 10 CpGs exhibited the same direction of association, while 3 of these associations were statistically significant after cell-mix adjustment (p-values < 0.05) (Table 3; Additional file 2: Table S3): cg04359558 (LITAF), cg13753183 (APTX), and cg16658191 (HK1). These CpGs have been annotated with genomic information and function (Table 4). Differences in the distributions of cell-types are presented in Additional file 2: Table S4).

Table 2 Crude and adjusted associations between M-values and asthma in the IOW F1 (18 years of age, n = 370) sample via logistic regression
Table 3 Replication of crude and adjusted associations between M-values and asthma in the ALSPAC (17 years of age, n = 720) sample via logistic regression
Table 4 Annotations and biological functions of genes associated with CpG sites associated with asthma in the replication study via either the adjusted or unadjusted models

Adjusting for estimated cell mixtures attenuated most ORs, and led to some discordance between IOW and ALSPAC, with only cg04359558 and cg16658191 exhibiting significant associations with asthma in both cohorts after cell-type adjustments. Only our top-hit (cg16658191) was significantly associated with asthma in all models across both cohorts.

Some of the tested CpGs were observed to have moderate-to-strong Spearman correlations (cg06866208, cg07948085, cg09241885, cg11310939, cg13753183, cg16658191) with the proportions of estimated eosinophils (range of rho values: − 0.51 to − 0.59, p-values < 0.0001) and were also moderately correlated with each other (range of rho values: 0.26 to 0.49, p-values < 0.0001) (Additional file 3: Figure S2), suggesting that methylation levels at these CpGs may be partial markers of eosinophils.

Given the inconsistent confounding effects of cell-type, we considered cg16658191 within the hexokinase-1 (HK1) gene as the finding with the most consistent evidence for an association with asthma and carried this CpG forward for cross-sectional analyses with allergy, inflammation and lung-function, as well as prospective analyses with infant respiratory outcomes.

HK1 DNA methylation is associated with allergy, inflammation and lung function

We found that DNAm at cg16658191 was lower among those with atopy (T-test: HK1 p-value < 0.001) and had an inverse non-linear association with logFeNO (rho = − 0.22, p-value < 0.0001), suggesting that it is involved in allergic sensitization and airway inflammation. Additionally, those with lower DNAm at this locus tended to have lower FEV1/FVC (rho = 0.10, p-value = 0.057) and FEF25–75% (rho = 0.095, p-value = 0.075) though these correlations were not statistically significant (Fig. 2).

Fig. 2
figure 2

Variation in DNAm (beta-values) at cg16658191 by a adolescent atopy, b log(FeNO), c FEV1/FVC, and d FEF25–75%, within the IOW F1 sample. HK1, hexokinase-1; FeNO, fractional exhaled nitric oxide; FEV1/FVC, forced expiratory volume in one second divided by the forced vital capacity; FEF25–75%, forced expiratory flow at 25–75% of forced vital capacity

Prospective follow-up for HK1 associations in infants

We then performed follow-up analyses for associations of our top locus, cg16658191, with wheeze during infancy and variations in gene-expression in the IOW F2 sample. Infants with lower levels of cord blood DNAm at cg16658191 had greater odds of wheeze without cold within the 1st year of life (Table 5), though adjustments for estimated cell-types confounded this association, particularly due to a strong correlation with nRBCs (rho = − 0.84, p-value < 0.0001) and moderate correlation with granulocytes (rho = 0.59, p-value < 0.0001). Additionally, DNAm at cg16658191 was inversely associated with the expression of HK1 (rho = − 0.22, p-value = 0.039) and increased expression of HK1 was associated with increased odds of wheeze without cold and odds of any wheeze, during the 1st year of life. Interestingly, these associations became stronger after adjusting for cellular heterogeneity (Table 5).

Table 5 Associations between cord blood DNAm at cg16658191 and expression of HK1 with infant wheeze without cold and any wheeze within the IOW F2 sample

Discovery of epigenomic loci associated with asthma (traditional EWAS approach)

Finally, we examined DNAm-asthma associations using a standard EWAS approach, regressing methylation levels for all CpGs on asthma status in unadjusted models and models adjusted for sex, CD4+ T-cells, CD8+ T-cells, monocytes, eosinophils, natural killer, and granulocytes. In the unadjusted models, 148 CpGs were significantly associated (FDR 5%) with asthma status. However, adjusting for sex and cell mixture resulted in attenuation of most of these results and none of the adjusted models produced FDR-significant associations. We compared the results from our models, unadjusted (Additional file 2: Table S5) and adjusted (Additional file 2: Table S6) that yielded p-values < 0.001, to the results from a prior EWAS in ALSPAC for current asthma and current wheeze at ages 7.5 and 16.5 years that yielded p-values < 0.001 [10]. Of the 674 CpGs that were associated with asthma (p-value < 0.001) in IOW prior to cell-type adjustment, 20 CpGs yielded p-values < 0.001 for all four models in ALSPAC (Additional file 2: Table S7). However, only one CpG yielded a p-value < 0.001 in the IOW and a p-value < 0.001 in ALSPAC when adjusting for cell mixture, and that was cg16658191. We also compared our results to the asthma-associated CpGs identified in a meta-analysis of children between the ages 4 and 8 years old [18] at the 11 (out of 14) sites that passed QC in our study. Although all 11 sites yielded nominally significant inverse associations with asthma in IOW (Additional file 2: Table S8) only cg10142874 retained even a nominal association with asthma in IOW after cell-mix adjustment (p-value = 0.013).


We performed an epigenome-wide association study of current asthma in the IOW cohort utilizing two statistical approaches and a replication analysis in an independent population. We identified that lower DNAm at cg16658191 within the 1st exon of HK1 as a marker of current asthma. This CpG was identified via random forest feature selection and confirmed using standard EWAS, and was replicated within an independent cohort (ALSPAC). We then produced similar associations between DNAm of cg16658191 and the expression of HK1 in cord blood with infant wheeze in the children of the IOW cohort. We also observed functional evidence of HK1′s involvement in infant wheeze using gene expression data that exhibited the expected associations with infant wheeze, given that promoter and first exon methylation are most commonly associated with repression of gene expression [6, 33]. DNAm at cg16658191, which is within the body and/or first exon, was inversely associated with HK1 expression and we showed that higher expression of HK1 was predictive of wheezing without a cold during infancy. The HK1 gene resides in 10q22.1 and encodes a protein that is integral in the first step of glycolysis [34] and in apoptotic resistance [35]. The consistency of these associations across different ages, with different respiratory outcomes, and utilizing both DNAm and gene expression as predictors, suggests that this gene may play an important role in the predisposition for wheezing and/or asthma.

We found that many of our RF-identified hits, including cg16658191, were inversely correlated with eosinophil counts in adult blood, similar to what was observed by Arathimos et al. [10]. However, confounding by cell-mixture may not be limited to eosinophil proportions. For instance, because of its crucial role in glucose metabolism, HK1 is highly expressed by erythrocytes [36]. This is consistent with the strong inverse correlation we observed between cg16658191 and estimated nRBC proportions in cord blood, which could indicate prematurity, restricted growth, or pregnancy complications [37]. Additionally, premature and low birth weight neonates are predisposed to early-life respiratory morbidity [38, 39]. However, adjustments for weeks of gestation did not appreciably alter our results (data not shown).

Though DNAm at cg16658191 may, in part, be a marker of high eosinophil counts in adult blood and nRBCs in cord blood, we found that, in addition to the relationship between DNAm and asthma, HK1 expression was strongly and significantly associated with infant wheeze even after cell-type adjustments. These findings suggest a role for DNAm of the HK1 gene in asthma and wheeze etiology that is independent of cell-type proportions, possibly through differential epigenetic regulation within a subset of asthma-associated cell-types. However, it is difficult to disentangle such relationships in studies that utilize tissues composed of mixed cell populations, such as blood. HK1 is involved in apoptotic resistance via binding to and stabilizing the mitochondrial membrane, whereas the dissociation of HK1 from the membrane makes those cells more susceptible to apoptosis [40]. Up-regulation of HK1 resulting in increased apoptotic resistance has been observed in cancerous cells [41] and HIV-1 infected macrophages [42]. Apoptotic-resistant pro-inflammatory cells are known to lead to prolonged inflammation [43] and apoptosis appears to be delayed in neutrophils [44] and T-lymphocytes [45] of asthmatics. This provides a possible mechanism through which HK1 epigenetic regulation and expression by immune cells may be involved in asthma and wheeze etiology.

We also examined relationships between DNAm and asthma using a more traditional EWAS approach, regressing the methylation beta-values for each loci on asthma status, and compared our findings to two recent EWAS, one performed by ALSPAC [10] and a meta-analysis of childhood asthma from multiple European cohorts [18]. This approach identified 148 CpGs that were significantly associated (5% FDR) with asthma prior to cell-type adjustment, and no FDR-significant findings after adjustment. We also found very little consistency in observed associations between our study and the ALSPAC study after adjusting for cell mixture. Only cg16658191 demonstrated an association with current asthma and wheeze in the fully adjusted models (p < 0.001) in both the IOW and ALSPAC. When comparing our traditional EWAS results to a meta-analysis of childhood asthma, only cg10142874 from that meta-analysis yielded an even nominally-significant association with asthma in IOW after cell-mixture adjustment.

Strengths of our study included the use of multiple samples to discover and replicate our findings, supported by gene expression studies, and the use of a validated tool, the ISAAC core questionnaire, to define current asthma status. However, it is also important to recognize this study’s limitations. One limitation is that detection of CpGs in the IOW birth cohort and the replication study in ALSPAC investigated concurrent associations. Hence, reverse causation in which asthma may result in differential methylation of HK1 cannot be excluded. Differences between the two cohorts, confounding by cell-mixture, errors in cell-mixture estimates, and asthma heterogeneity may have limited replicability of more loci after additional adjustments. The discovery and replication samples were similar in sex-distribution, prevalence of asthma, and age, but differed in estimated cell-type distributions (Additional file 2: Table S4). Additionally, though the estimated cell-proportions are imperfect, we utilized the gold standard for predicting cell mixtures from DNAm arrays [27] and comprehensively evaluated the impact of cellular heterogeneity on our findings. There is also the possibility of residual confounding, perhaps by genotype. Numerous SNPs have been implicated as asthma susceptibility loci, and some SNPs have been shown to influence the methylation status of CpG sites. We cannot rule out the possibility that our findings are markers of upstream genetic effects on both DNA methylation and asthma susceptibility. It is also important to point out that the relationships we observed between DNAm and expression of HK1 in cord blood with infant wheeze, cannot be directly extrapolated to asthma. It is unclear whether DNAm patterns of HK1 in cord blood are informative for the later development of asthma, although our findings provide evidence that lower DNAm and increased expression of HK1 in cord blood are associated with wheezing in the 1st year of life. Finally, asthma is a heterogeneous condition, in which different phenotypes may arise via different underlying physiological mechanisms [3]. We showed that HK1 DNAm levels were also strongly associated with atopy and FeNO, which may indicate that the regulation of this gene is particularly important in allergic-asthma. This also raises the possibility that some of our other discovered, but not replicated, loci may be associated with specific asthma-phenotypes. If the prevalences of these phenotypes differ between IOW and ALSPAC, this may have contributed to the discordant results. Interestingly, some of the CpGs with discordant results between the two cohorts were within genes or genomic regions that have previously been associated with asthma or are involved in apoptotic signaling, like HK1. For instance, cg04359558 is within the body of LITAF, a gene that encodes a DNA-binding protein that promotes the expression of TNF-α and other cytokines known to be involved in pro-inflammatory and apoptotic signaling [46, 47]. UNC45B, annotated to cg00100703, lies within the asthma susceptibility region 17q12–21 [48], though this particular gene has not previously been linked to asthma.


In summary, we discovered a novel epigenetic association with adolescent asthma at cg16658191 within HK1, whose DNAm and expression levels in cord blood were also associated with infant wheeze without cold. In addition, the association of cg16658191 with asthma was replicated in an independent cohort. However, we also found that our findings may be affected, at least in part, by heterogeneous cell-mixtures. Further research is required to determine whether these observed associations are reproducible in other populations, particularly with different racial and ethnic characteristics, and whether some of these loci are differentially regulated between those with and without asthma in specific cell-type populations such as eosinophils.

Availability of data and materials

The minimal data sets analyzed in the current study are available from the corresponding author upon reasonable request. For access to the full Isle of Wight Cohort data please see: cohort-data-use; for access to ALSPAC data please see:



Avon Longitudinal Study of Parents and Children


DNA methylation




Isle of Wight


out of bag


random forest


variable importance measure


  1. Martinez FD, Vercelli D. Asthma. Lancet. 2013;382:1360–72.

    Article  Google Scholar 

  2. Akinbami LJ, Moorman JE, Liu X. Asthma Prevalence, Health Care Use, and Mortality: United States, 2005–2009. National Health Statistics Reports. Jhyattsville, MD; 2011.

  3. Lötvall J, Akdis CA, Bacharier LB, Bjermer L, Casale TB, Custovic A, et al. Asthma endotypes: a new approach to classification of disease entities within the asthma syndrome. J Allergy Clin Immunol. 2011;127(2):355–60.

    Article  Google Scholar 

  4. DeVries A, Vercelli D. Early predictors of asthma and allergy in children: the role of epigenetics. Curr Opin Allergy Clin Immunol. 2015;15(5):435–9.

    CAS  Article  Google Scholar 

  5. Durham AL, Wiegman C, Adcock IM. Epigenetics of asthma. Biochim Biophys Acta. 2011;1810(11):1103–9.

    CAS  Article  Google Scholar 

  6. Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012;13(7):484–92.

    CAS  Article  Google Scholar 

  7. Ho S. Environmental epigenetics of asthma: an update. J Allergy Clin Immunol. 2010;126(3):453–65.

    Article  Google Scholar 

  8. Yang IV, Tomfohr J, Singh J, Foss CM, Marshall HE, Que LG, et al. The clinical and environmental determinants of airway transcriptional profiles in allergic asthma. Am J Respir Crit Care Med. 2012;185(6):620–7.

    CAS  Article  Google Scholar 

  9. Yang IV, Pedersen BS, Liu A, O’Connor GT, Teach SJ, Kattan M, et al. DNA methylation and childhood asthma in the inner city. J Allergy Clin Immunol. 2015;136(1):69–80.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  10. Arathimos R, Suderman M, Sharp GC, Burrows K, Granell R, Tilling K, et al. Epigenome-wide association study of asthma and wheeze in childhood and adolescence. Clin Epigenetics. 2017;9:112.

    Article  Google Scholar 

  11. Barton SJ, Ngo S, Costello P, Garratt E, El-Heis S, Antoun E, et al. DNA methylation of Th2 lineage determination genes at birth is associated with allergic outcomes in childhood. Clin Exp Allergy. 2017;47(12):1599–608.

    CAS  Article  Google Scholar 

  12. DeVries A, Wlasiuk G, Miller SJ, Bosco A, Stern DA, Lohman IC, et al. Epigenome-wide analysis links SMAD3 methylation at birth to asthma in children of asthmatic mothers. J Allergy Clin Immunol. 2017;140(2):534–42.

    CAS  Article  Google Scholar 

  13. Guarnieri M, Balmes JR. Outdoor air pollution and asthma. Lancet. 2014;383(9928):1581–92.

    CAS  Article  Google Scholar 

  14. Somineni HK, Zhang K, Biagini Myers JM, Kovacic MB, Ulm A, Jurcak N, et al. TET1 methylation is associated with childhood asthma traffic-related air pollution. J Allergy Clin Immunol. 2016;137(3):797–805.

    CAS  Article  Google Scholar 

  15. Prunicki M, Stell L, Dinakarpandian D, de Planell-Saguer M, Lucas RW, Hammond SK, et al. Exposure to NO2, CO, and PM2.5 is linked to regional DNA methylation differences in asthma. Clin Epigenetics. 2018;10(1):2.

    Article  Google Scholar 

  16. Verlaan DJ, Berlivet S, Hunninghake GM, Madore AM, Larivière M, Moussette S, et al. Allele-specific chromatin remodeling in the ZPBP2/GSDMB/ormdl3 locus associated with the risk of asthma and autoimmune disease. Am J Hum Genet. 2009;85(3):377–93.

    CAS  Article  Google Scholar 

  17. Kothari PH, Qiu W, Croteau-Chonka DC, Martinez FD, Liu AH, Lemanske RF, et al. The role of local CpG DNA methylation in mediating the 17q21 asthma-susceptibility GSDMB/ORMDL3 expression quantitative trait locus. J Allergy Clin Immunol. 2018;141(6):2282–2286.e6.

    Article  Google Scholar 

  18. Xu CJ, Söderhäll C, Bustamante M, Baïz N, Gruzieva O, Gehring U, et al. DNA methylation in childhood asthma: an epigenome-wide meta-analysis. Lancet Respir Med. 2018;6:379–88.

    CAS  Article  Google Scholar 

  19. Arshad SH, Holloway JW, Karmaus W, Zhang H, Ewart S, Mansfield L, et al. Cohort profile: the Isle of Wight whole population birth cohort (IOWBC). Int J Epidemiol. 2018;47(4):1043–1044i.

    Article  Google Scholar 

  20. Asher MI, Keil U, Anderson HR, Beasley R, Crane J, Martinez F, et al. International study of asthma and allergies in childhood (ISAAC): rationale and methods. Eur Respir J. 1995;8(3):483–91.

    CAS  Article  Google Scholar 

  21. Miller MR, Hankinson J, Brusasco V, Burgos F, Casaburi R, Coates A, et al. Standardisation of spirometry. Eur Respir J. 2005;26(2):319–38.

    CAS  Article  Google Scholar 

  22. Scott M, Raza A, Karmaus W, Mitchell F, Grundy J, Kurukulaaratchy RJ, et al. Influence of atopy and asthma on exhaled nitric oxide in an unselected birth cohort study. Thorax. 2010;65:258–63.

    Article  Google Scholar 

  23. Fraser A, Macdonald-Wallis C, Tilling K, Boyd A, Golding J, Smith GD, et al. Cohort profile: the avon longitudinal study of parents and children: ALSPAC mothers cohort. Int J Epidemiol. 2013;42:97–110.

    Article  Google Scholar 

  24. Boyd A, Golding J, Macleod J, Lawlor DA, Fraser A, Henderson J, et al. Cohort profile: the ‘Children of the 90 s’—the index offspring of the avon longitudinal study of parents and children. Eur J Epidemiol. 2013;42:111–27.

    Google Scholar 

  25. Relton CL, Gaunt T, McArdle W, Ho K, Duggirala A, Shihab H, et al. Data resource profile: accessible resource for integrated epigenomic studies (ARIES). Int J Epidemiol. 2015;44:1181–90.

    Article  Google Scholar 

  26. Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlén SE, Greco D, et al. Differential DNA methylation in purified human blood cells: Implications for cell lineage and studies on disease susceptibility. PLoS ONE. 2012;7(7):e41361.

    CAS  Article  Google Scholar 

  27. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13:86.

    Article  Google Scholar 

  28. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  29. Goldstein BA, Hubbard AE, Cutler A, Barcellos LF. An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings. BMC Genet. 2010;11:49.

    Article  Google Scholar 

  30. Everson TM, Lyons G, Zhang H, Soto-Ramírez N, Lockett GA, Patil V, et al. DNA methylation loci associated with atopy and high serum IgE: a genome-wide application of recursive Random Forest feature selection. Genome Med. 2015;7:89.

    Article  Google Scholar 

  31. Storey JD, Taylor JE, Siegmund D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J R Stat Soc Ser B Stat Methodol. 2004;66(1):187–205.

    Article  Google Scholar 

  32. Bakulski KM, Feinberg JI, Andrews SV, Yang J, Mckenney S, Witter F, et al. DNA methylation of cord blood cell types: applications for mixed cell birth studies. Epigenetics. 2016;11(5):354–62.

    Article  Google Scholar 

  33. Brenet F, Moh M, Funk P, Feierstein E, Viale AJ, Socci ND, et al. DNA methylation of the first exon is tightly linked to transcriptional silencing. PLoS ONE. 2011;6(1):e14524.

    CAS  Article  Google Scholar 

  34. John S, Weiss JN, Ribalet B. Subcellular localization of hexokinases i and ii directs the metabolic fate of glucose. PLoS ONE. 2011;6(3):e17674.

    CAS  Article  Google Scholar 

  35. Abu-Hamad S, Zaid H, Israelson A, Nahon E, Shoshan-Barmatz V. Hexokinase-I protection against apoptotic cell death is mediated via interaction with the voltage-dependent anion channel-1. J Biol Chem. 2008;283(19):13482–90.

    CAS  Article  Google Scholar 

  36. Van Wijk R, Van Solinge WW. The energy-less red blood cell is lost: erythrocyte enzyme abnormalities of glycolysis. Blood. 2005;106(13):4034–42.

    Article  Google Scholar 

  37. Hermansen MC. Nucleated red blood cells in the fetus and newborn. Arch Dis Child Fetal Neonatal Ed. 2001;84(3):F211–5.

    CAS  Article  Google Scholar 

  38. Rusconi F, Galassi C, Forastiere F, Bellasio M, De Sario M, Ciccone G, et al. Maternal complications and procedures in pregnancy and at birth and wheezing phenotypes in children. Am J Respir Crit Care. 2007;175:16–21.

    Article  Google Scholar 

  39. Edwards MO, Kotecha SJ, Lowe J, Richards L, Watkins WJ, Kotecha S. Management of prematurity-associated wheeze and its association with atopy. PLoS ONE. 2016;11(5):e0155695.

    Article  Google Scholar 

  40. Schindler A, Foley E. Hexokinase 1 blocks apoptotic signals at the mitochondria. Cell Signal. 2013;25(12):2685–92.

    CAS  Article  Google Scholar 

  41. Kroemer G, Pouyssegur J. Review tumor cell metabolism: cancer’s Achilles’ heel. Cancer Cell. 2008;13:472–82.

    CAS  Article  Google Scholar 

  42. Sen S, Kaminiski R, Deshmane S, Langford D, Kahlili K, Amini S, et al. Role of hexokinase-1 in the survival of HIV-1- infected macrophages. Cell Cycle. 2015;14(7):980–9.

    CAS  Article  Google Scholar 

  43. Luo HR, Loison F. Constitutive neutrophil apoptosis: mechanisms and regulation. Am J Hematol. 2008;83(4):288–95.

    CAS  Article  Google Scholar 

  44. Yang EJ, Choi E, Ko J, Kim D, Lee J-S, Kim IS. Differential effect of CCL2 on constitutive neutrophil apoptosis between normal and asthmatic subjects. J Cell Physiol. 2011;227:2567–77.

    Article  Google Scholar 

  45. Potapinska O, Demkow U. T lymphocyte apoptosis in asthma. Eur J Med Res. 2009;14:192–5.

    Article  Google Scholar 

  46. Min J, Zhang W, Gu Y, Hong L, Yao L, Li F, et al. CIDE-3 interacts with lipopolysaccharide-induced tumor necrosis factor, and overexpression increases apoptosis in hepatocellular carcinoma. Med Oncol. 2011;28:S219–2227.

    Article  Google Scholar 

  47. Tang X, Molina M, Amar S. p53 short peptide (p53pep164) regulates lipopolysaccharide-induced tumor necrosis factor-alpha factor/cytokine expression. Cancer Res. 2007;67(3):1308–16.

    CAS  Article  Google Scholar 

  48. Naumova AK, Al Tuwaijri A, Morin A, Vaillancout VT, Madore A-M, Berlivet S, et al. Sex- and age-dependent DNA methylation at the 17q12-q21 locus associated with childhood asthma. Hum Genet. 2013;132:811–22.

    CAS  Article  Google Scholar 

  49. Sano Y, Date H, Igarashi S, Onodera O, Oyake M, Takahashi T, et al. Aprataxin, the causative protein for EAOH is a nuclear protein with a potential role as a DNA repair protein. Am Neurol Assoc. 2004;55:241–9.

    CAS  Article  Google Scholar 

Download references


This publication is the work of the authors, who will serve as guarantors for the contents of this paper. The authors gratefully acknowledge the cooperation of the children and parents who participated in the IOW birth cohort and appreciate the hard work of Mrs. Sharon Matthews and the Isle of Wight research team in collecting data and Nikki Graham for technical support. We are extremely grateful to all the families who took part in the ALSPAC study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses.


The IOW portion of this work was supported by the National Institute of Health under award numbers R01 AI091905 and R01HL132321 (PI: Wilfried Karmaus) and R01AI121226 (MPI: Hongmei Zhang, John Holloway). The 10-year follow-up of this study was funded by the National Asthma Campaign, UK (Grant No 364) and the 18-year follow-up by NIH/NHLBI R01 HL082925-01 (PI: S. Hasan Arshad). We thank the High-Throughput Genomics Group at the Wellcome Trust Centre for Human Genetics (funded by Wellcome Trust grant reference 090532/Z/09/Z) for the generation of the IOW methylation data. The UK Medical Research Council and the Wellcome Trust (Grant ref: 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. The Accessible Resource for Integrated Epigenomics Studies (ARIES) was funded by the UK Biotechnology and Biological Sciences Research Council (BB/I025751/1 and BB/I025263/1). This work was supported by the Medical Research Council Integrative Epidemiology Unit and the University of Bristol (MC_UU_12013_2).

Author information

Authors and Affiliations



WK, JWH & SHA coordinated the study, SLE generated gene expression data for IOW, TME & HZ designed the analysis plan and TME carried out the statistical analysis for IOW. AK, FIR, & GAL performed data processing, QA, and QC. MF, SLE & VKP, assisted in interpreting results for the IOW. KB performed statistical analyses for ALSPAC. AJH, CLR, GCS & KB, assisted in interpreting ALSPAC results. TME, HZ, WK, JWH and KB, were major contributors in writing the manuscript. All authors revised the final manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Wilfried Karmaus.

Ethics declarations

Ethics approval and consent to participate

Ethical approval was obtained from National Research Ethics Service, NRES Committee South Central—Southampton B for the 18-year follow-up (06/Q1701/34) and NRES Committee South Central—Hampshire B (09/H0504/129) for the follow-up of IOW participants’ offspring; written informed consent was provided by the infants’ parents. Ethical approval for the ALSPAC study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees; written informed consent was provided by all participants.

Consent for publication

Not applicable.

Competing interests

The authors declare that they are no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1: Method S1.

Cohort-Specific DNA-M preprocessing steps. Methods S2. SVA to account for technical variations in ALSPAC.

Additional file 2:

This file includes supplemental Tables (S1–S8)—Table S1. Comparison of cell-proportions and lung function variables across the Stage-1 (n = 91) and Stage-2 (n = 279) samples from the IOW F1 Sample. Table S2. Parameter estimates from logistic regression models performed in the Stage-2 sample (ns2 = 279), regressing current asthma status on DNA methylation M-values for all CpGs selected from the Stage-1 analysis. Table S3. Parameter estimates, standard errors, and p-values to compare the IOW F1 results to the ALSPAC replication results for CpGs that yielded an association within a 5% FDR in the Stage-2 analysis. Table S4. Comparing adjustment covariates between the IOW F1 sample and the ALSPAC sample; the within cohort comparisons are testing for differences in these variables between those with and without asthma. Table S5. Results from linear asthma EWAS in IOW F1 (n = 370), in which methylation beta values were regressed on asthma status, unadjusted for possible confounders. Table S6. Results from linear asthma EWAS in IOW F1 (n = 370), in which methylation beta values were regressed on asthmat status, adjusted for sex, CD4T cells, CD8T cells, Monocytes, Natural Killers, Eosinophils, and other Granulocytes. Table S7. CpGs that yielded unadjusted association (p < 0.001) with current asthma in IOW F1 (unadjusted models) that were also identified as having unadjusted associations with current asthma and wheeze at age 7.5 and 16.5 years old in the previously published ALSPAC EWAS. Table S8. Associations between asthma and DNA methylation within the IOW cohort (unadjusted models) at the 14 CpGs that were identified as being asthma associated in a meta analysis of European children.

Additional file 3:

This file includes supplemental figures (S1, S2)—Figure S1. Tracking of the misclassification rates (y-axis) across iterations (x-axis) of the recursive RF feature selection. Figure S2. Correlations (Spearman) between estimated cell-proportions in blood and DNAM M-values for the 10 CpGs that were identified as candidates for the replication study. Statistically significant correlations are designated at p-values < 0.05 (*), p-values < 0.001 (**), and p-values < 0.001 (***).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Everson, T.M., Zhang, H., Lockett, G.A. et al. Epigenome-wide association study of asthma and wheeze characterizes loci within HK1. Allergy Asthma Clin Immunol 15, 43 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Asthma
  • Expression
  • Hexokinase-1
  • HK1
  • Infant wheeze
  • Isle of Wight
  • Methylation