Skip Navigation

Genetic Susceptibility to Respiratory Syncytial Virus Bronchiolitis Is Predominantly Associated with Innate Immune Genes

  1. Riny Janssen1,a,
  2. Louis Bont4,a,
  3. Christine L. E. Siezen1,a,
  4. Hennie M. Hodemaekers1,
  5. Marieke J. Ermers4,
  6. Gerda Doornbos2,
  7. Ruben van 't Slot5,
  8. Ciska Wijmenga5,
  9. Jelle J. Goeman6,
  10. Jan L. L. Kimpen4,
  11. Hans C. van Houwelingen2,6,
  12. Tjeerd G. Kimman3 and
  13. Barbara Hoebee1
  1. 1Laboratory for Toxicology, Pathology, and Genetics, Bilthoven, The Netherlands
  2. 2Expertise Centre for Methodology and Information Services, Bilthoven, The Netherlands
  3. 3Laboratory for Vaccine- Preventable Diseases, National Institute for Public Health and the Environment, Bilthoven, The Netherlands
  4. 4University Medical Center, Wilhelmina Children's Hospital, Utrecht, The Netherlands
  5. 5Complex Genetics Section, Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
  6. 6Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
  1. Reprints or correspondence: Dr. Riny Janssen, Laboratory for Health Protection Research, National Institute for Public Health and the Environment, PO Box 1, 3720 BA, Bilthoven, The Netherlands (riny.janssen{at}rivm.nl).
  1. a R.J., L.B., and C.L.E.S. contributed equally to this work.

Abstract

Background. Respiratory syncytial virus (RSV) is a common cause of severe lower respiratory tract infection in infants. Only a proportion of children infected with RSV require hospitalization. Because known risk factors for severe disease, such as premature birth, cannot fully explain differences in disease severity, genetic factors have been implicated.

Methods. To study the complexity of RSV susceptibility and to identify the genes and biological pathways involved in its development, we performed a genetic association study involving 470 children hospitalized for RSV bronchiolitis, their parents, and 1008 random, population controls. We analyzed 384 single-nucleotide polymorphisms (SNPs) in 220 candidate genes involved in airway mucosal responses, innate immunity, chemotaxis, adaptive immunity, and allergic asthma.

Results. SNPs in the innate immune genes VDR (rs10735810; P = .0017), JUN (rs11688; P = .0093), IFNA5 (rs10757212; P = .0093), and NOS2 (rs1060826; P = .0031) demonstrated the strongest association with bronchiolitis. Apart from association at the allele level, these 4 SNPs also demonstrated association at the genotype level (P = .0056, P = .0285, P = .0372, and P = .0117 for the SNPs in VDR, JUN, IFNA5, and NOS2, respectively). The role of innate immunity as a process was reinforced by association of the whole group of innate immune SNPs when the global test for groups of genes was applied (P = .046)

Conclusion. SNPs in innate immune genes are important in determining susceptibility to RSV bronchiolitis.

The severity of respiratory syncytial virus (RSV) infections in young children varies from subclinical or mild symptomatic upper respiratory tract infection to severe lower respiratory tract disease leading to hospitalization and, occasionally, to death. Some children are more prone to experience a severe course of disease, such as premature infants and infants <6 weeks of age [1, 2]. However, RSV infection also can result in serious disease in children without any of these risk factors.

In animal models, Th2 responses have been implicated in the disease process. In humans, genetic association studies have been performed to study susceptibility to RSV infection, and such studies in children have shown that polymorphisms in interleukin (IL)–4 and the IL-4 receptor are associated with severe RSV disease [35], confirming a role for Th2 responses in severe RSV bronchiolitis in humans. However, genes involved in various other immune processes have also been implicated in determining susceptibility to or severity of RSV infection. For example, polymorphisms in TLR4, which is involved in innate immunity; IL10, which is involved in regulation of adaptive immunity; IL8, which is involved in chemotaxis; and SPD, which is involved in airway mucosal responses, have been associated with the disease [68]. A recent review of the literature revealed that, altogether, only 13 genes have been studied to determine their association with susceptibility to severe RSV infection [9]. Of these genes, 9 demonstrated association with severe RSV bronchiolitis in at least one study. Apparently, susceptibility to RSV infection is a complex trait, and a broad range of immune-mediated processes play a role.

Both direct virus-induced airway damage and RSV-induced inflammation may contribute to severe disease in RSV-infected children, indicating that, at the time of RSV infection, a complex series of events takes place in which both the virus and the host play a role. Because correlation of disease severity with viral load is controversial [1012], the common belief is that RSV-induced inflammation makes a major contribution to disease severity [1, 2]. Consistent with this belief, ribavirin treatment was shown to be of little clinical benefit [1315], underlining that factors other than viral load play a role. To identify novel genes and biological pathways involved in determining susceptibility to severe RSV infection and to shed light on the complexity of genetic susceptibility to severe RSV bronchiolitis, we performed a large-scale genotyping study by use of a candidate gene approach. On the basis of data available in the literature and the results of our recent gene-expression analyses in a murine model of RSV infection [16], 384 single-nucleotide polymorphisms (SNPs) in 220 candidate genes involved in a broad array of immunological processes were analyzed.

Methods

Study design. The 480 children (median age, 70 days; 10 children were >12 months of age) who were included in the present study were hospitalized for RSV bronchiolitis during 1992–2006. Part of the cohort (i.e., 207 children and their parents) had participated in our previous analyses [4, 8]. RSV infection was confirmed by direct immunofluorescent assay of nasopharyngeal cells. Children with a history of airway morbidity, airway medication, or wheeze were excluded. Blood samples or buccal swab specimens were collected from these 480 children and from both of their parents for DNA isolation. In 13 cases, samples were obtained from the child and from 1 parent only. All parents completed a questionnaire collecting data on medical history, pregnancy, and ethnic origin. An unselected control population [4, 8] of 1030 persons born in The Netherlands (447 of whom had participated in our previous studies [4, 8]) was randomly taken from the Regenboog study [17], a large Dutch population health examination survey. All parents provided written, informed consent, and the study was approved by the local ethics committee.

Selection of genes and SNPs. A total of 220 genes were selected based on searches of studies in the literature in which the genes were found in the context of RSV infection, or because they were up-regulated in our murine model of RSV infection by use of microarray analysis [16]. The 220 genes were categorized into 5 processes: (1) the airway mucosal response, (2) innate immunity, (3) chemotaxis, (4) adaptive immunity, and (5) allergic asthma. These 5 processes were selected because, in each process, genetic associations have previously been found. Some genes were categorized in >1 process. The genes, their role in a process, and the 5 processes are presented in table 1. Genes in the innate immunity category included a group of interferon (IFN)–regulated genes, because these genes were highly up-regulated in the murine lung on RSV infection [16]. Genes involved in chemotaxis were included as a separate category, because a large number of chemokines were induced on RSV infection [16]. All chemokines, their receptors, and several other adhesion factors were included in this category.

Table 1

Candidate genes studied in our single-nucleotide polymorphism (SNP) analysis and genes associated with susceptibility to respiratory syncytial virus bronchiolitis.

Where possible, SNPs were selected on the basis of published associations with any disease or functional parameter. Alternatively, promoter or coding SNPs were selected. This approach was used to increase our chances of selecting SNPs with functional consequences. If such SNPs were not present, HapMap-validated SNPs with a frequency of >5% were selected. Our initial list of candidate genes contained >220 genes. However, some genes demonstrated linkage to neighboring candidate genes. In these cases, 1 SNP was used to tag >1 gene—for instance, in the case of chemokine genes CXCL9, CXCL10, and CXCL11. Genes for which no suitable SNPs could be found were excluded. Finally, because of the various constraints of the Illumina procedure, specific primer sets could not be developed for some selected SNPs.

DNA isolation and genotyping. DNA was isolated as described elsewhere [4]. For all children, parents, and control subjects, SNPs were genotyped using Illumina's Beadarray technology on a 384 Sentrix array matrix, according to Illumina's Goldengate protocol. Results for 37 SNPs were excluded because of a low signal, overlapping or multiple clusters, or scattering of the clusters. As a result, genotypes could not be accurately identified from the data. Genotyping failed for 22 control subjects, 10 children, and 15 parents, probably because of poor DNA quality. In our analysis, identical twins were counted as 1 case. Nonidentical twins (n = 9) and 1 sibpair were counted as separate cases. As a result, genotype data for 470 children, 459 mothers, 448 fathers, and 1008 control subjects could be used for analysis. Of the 470 children, 349 were native Dutch (with parents and grandparents born in The Netherlands).

Statistical analysis. The 347 SNPs were all in Hardy-Weinberg equilibrium (P < .01). We performed a novel statistical test using Gauss software (Aptech Systems). This test takes into account both the case-cohort data, using only the Dutch case patients and the control subjects, and the transmission disequilibrium test (TDT) data (i.e., data on the transmission of alleles from parents to children, culled from data for all case-parent trios). Because case-cohort data and TDT data are only partly independent, the test also considers the correlation between the 2 data sets. The standard TDT analysis, which counts transmitted and nontransmitted alleles, yields only the relative risk for the allele. To obtain relative risks for the genotype within the TDT analyses, we used the pseudo-control methodology of Cordell et al. [18], which constructs matched case-cohort data sets consisting of the case and 3 pseudo controls. Together, the case and 3 pseudo controls form the 4 equally likely genotypes that can arise given the parental genotypes. Under an additive model on the ln(relative risk) scale, pseudo-control methodology yields exactly the same relative risk as does classical TDT analysis. However, there is a slight difference in the way that the standard errors are computed. Pseudo-control methodology takes the trio structure into account, whereas classical TDT analysis ignores that information and counts only transmitted and nontransmitted alleles.

New methodology was developed to combine the relative risk estimates from the case-cohort and the case-parent analyses. The methodology that we previously employed for combined analysis used all genotypic information, under the assumption of random mating and Hardy-Weinberg equilibrium [19]. These assumptions may not always be valid and have been criticized by Epstein et al. [20]. In our new approach, we do not attempt to create a joint model of all data; instead, we estimate the correlation between the 2 estimates of the same relative risk and use that correlation in our combination procedure, to obtain an estimate with a minimal SE, as is done in meta-analysis. This methodology enables analysis of all data at the allele level (df = 1), under the assumption of additive effects on the relative risk, and analysis at the genotype level (df = 2), without any such assumptions. In this new test, the 2 estimates of the same relative risk (i.e., the relative risks for the TDT and case-cohort analysis) are combined to obtain 1 estimate with a minimal SE, as is done in meta-analysis (for more details, see the Appendix).

The “global test” for groups of genes was used to evaluate which processes are associated with RSV bronchiolitis. This test was originally developed for analysis of microarray data, and it generates 1 P value for the association between a group of genes involved in 1 process and disease [21], by use of the R-package globaltest (see the Bioconductor Web page, available at http://www.biconductor.org/packages/2.0/bioc/html/globaltest.html). The global test was performed only on case-cohort data at the allele level, because it compares distribution of all alleles in a process between unmatched cases and controls. In TDT analysis, pseudo controls are generated based on the parents' genotype, and these are directly compared with their respective cases in a matched case-control analysis. At the moment, there is no version of the global test available for matched case-control data.

Results

SNPs associated with RSV bronchiolitis. Using the Illumina platform, 384 SNPs were identified in 470 children hospitalized for RSV bronchiolitis, their parents, and 1008 Dutch population controls. Genotype determination was successful for 347 of these 384 SNPs. These SNPs are located in 210 of the 220 genes initially analyzed. The complete list of genes for which genotype analysis was successful is presented in table 1. In total, 22 SNPs in 21 genes were associated with severe RSV disease either at the allele or genotype level (P < .05) (table 2). The P value calculated using this test is a measure of the variance in RSV susceptibility in the population that can be explained by this association, and it is, therefore, a measure of the strength of the association.

Table 2

Single-nucleotide polymorphisms (SNPs) significantly associated with severe respiratory syncytial virus bronchiolitis at the allele and genotype level.

Associations with RSV bronchiolitis were found for genes in all processes: 2 of 47 tested SNPs in genes involved in the local lung response demonstrated such an association, and 11 of 122 innate immunity SNPs, 3 of 70 chemotaxis SNPs, 7 of 102 adaptive immunity SNPs, and 6 of 51 allergic asthma SNPs demonstrated the association (table 1). In one gene (IL4R), 2 SNPs were associated with disease. These SNPs were not in strong linkage disequilibrium, suggesting independent effects. The odds ratio (OR) for all risk alleles was between 1.2 and 1.7, and, for all protective alleles, it was between 0.5 and 0.8, indicating that all individual SNPs have small effects, which is commonly found for associations in complex genetic diseases. ORs and P values for all SNPs are presented in [table 3], which shows associations at the allele level, and [table 4], which shows associations at the genotype level.

Table 3

Odds ratios and P values for all single- nucleotide polymorphisms at the allele level, by type of analysis.

Table 4

Odds ratios and P values for all single-nucleotide polymorphisms at the genotype level, by type of analysis.

Closer examination of associated SNPs revealed that they could be divided into 3 subgroups. The first subgroup comprised 5 of the 22 SNPs. These SNPs were associated with severe RSV disease both at the allele and genotype level (i.e., an SNP in VDR, JUN, IFNA5, NOS2A, and FCER1A) (table 2). In accordance with this finding, these associations were among those with the lowest P values and the strongest effects as measured by the OR. The first 4 genes are involved in innate immunity and the last gene in allergic asthma. The second subgroup comprised 12 of the 22 associated SNPs. For this group, associations were found only at the allele level (table 2), indicating a codominant effect. For all these associations, the P value was between .01 and .05. These genes are involved in innate immunity (IFNA13, IL15, STAT1, and TLR8), chemotaxis (CCL8, ITGB2, and VCAM1), adaptive immunity (CD28 and STAT1), and allergic asthma (MS4A2, ADAM33, IL4R, and IL9R). TLR8 is located on the X chromosome, and association was found only in males. The last subgroup comprised 5 of our total of 22 total SNPs. These SNPs showed an association with RSV disease at the genotype level only and were present in genes involved in innate immunity (TNF and NCF2) and adaptive immunity (IL10, IL4R, and IL17). Closer examination of the data revealed that, for all 5 SNPs, heterozygosity was associated with reduced susceptibility to RSV infection. Only for the association with the IL10 SNP was P < .01 reached. For the other SNPs, the P values were between .01 and .05. For these 5 SNPs, there was no evidence of a trend in relative risks for the genotypes, indicating that these effects were not due to codominance of the allele but, rather, to a specific effect of the heterozygous genotype.

Association between the group of genes involved in innate immunity and RSV bronchiolitis. As in many genetic studies, the strength of the association, as indicated by the P value, was not very high for the SNPs. Therefore, and because of possible spurious association due to multiple testing, our list of associated genes might contain false-positive associations. However, on the basis of the number of associated SNPs in a process, 2 processes (i.e., innate immunity and allergic asthma) were clearly overrepresented. This suggests that, of the 5 processes tested, these 2 processes are most important and are more likely to contain true associations. Interestingly, this also implies that the SNPs in genes involved in chemotaxis, including all tested chemokines, as well as SNPs in genes involved in local lung responses, are less important. However, overrepresentation of associations in a certain process does not take into account the strength of the association. Therefore, an independent statistical test was used to evaluate the importance of the 5 selected processes in susceptibility to RSV bronchiolitis. For this purpose, the global test for groups of genes, which was developed for the analysis of microarray data, was used. This test calculates 1 P value for the association of a group of SNPs. The fact that only 5 groups of genes were tested reduces the problem of multiple testing and may enable the ranking of pathways and biological processes on the basis of their importance in susceptibility to RSV bronchiolitis. The 5 immunological processes are partly overlapping, and, therefore, certain SNPs were included in >1 immunological process. Results, as presented in table 5, reveal that only the group of SNPs in genes involved in innate immunity was associated with susceptibility to RSV bronchiolitis (P < .05).

Table 5

Association of the group of genes involved in innate immunity with susceptibility to respiratory syncytial virus bronchiolitis.

Discussion

To identify genes and biological pathways important in determining genetic susceptibility to RSV bronchiolitis, we performed a large-scale genetic study that used a candidate gene approach. In total, 22 SNPs in 21 genes demonstrated a significant association with severe RSV bronchiolitis either at the allele or genotype level, or at both levels. Associated genes were found in all pathways tested. However, the 4 SNPs with the strongest association at both the allele and genotype level are located in genes involved in innate immunity (i.e., the VDR, JUN, NOS2A, and IFNA5 genes), highlighting the importance of this pathway. Indeed, the global test evaluating the association of groups of genes also indicated that the group of innate immune SNPs, as a whole, was associated with susceptibility to RSV bronchiolitis. SNPs in allergic asthma genes were over-represented, but this category did not reach significance in the global test. This could be a reflection of a possible higher degree of linkage between the SNPs in this category, because more SNPs were tested per gene involved in allergic asthma. In addition, the associations of individual SNPs in this category were clearly less strong than those in the category of innate immune genes.

The SNP in VDR (vitamin D receptor) was previously associated with susceptibility to diabetes [22]. Other SNPs in this gene have been associated with susceptibility to tuberculosis and allergic asthma [23, 24], and the VDR has been implicated in down-regulating interleukin (IL)–12 and IFN-γ production [25]. The SNP in NOS2A was previously associated with Par-kinson's disease [26], and inducible nitric oxide synthase (iNOS) has a role in various airway diseases [27]. To our knowledge, the SNPs in the innate immune genes JUN and IFNA5 have never been previously associated with susceptibility to disease. The importance of IFN-α in RSV infection is, however, highlighted by the fact that RSV can interfere with IFN-α pro-duction [28, 29]. JUN is part of transcription factor AP-1, which is one of the mediators of proinflammatory cytokine production [30]. Apparently, JUN plays an important role in the host response to RSV infection. The association with FCER1A was also found at both the allele and genotype level, but it was less strong (P = .01 to P = .05). This SNP has been previously associated with altered FcεRI expression levels and allergic disease [31].

The second category of SNPs demonstrated association at the allele level only. These SNPs are present in the genes involved in innate immunity (IFNA13, IL15, STAT1, and TLR8), chemotaxis (CCL8, ITGB2, and VCAM1), adaptive immunity (CD28 and STAT1), and allergic asthma (MS4A2, ADAM33, IL4R, and IL9R). To our knowledge, the SNPs in genes encoding IL4R, IFNA13, CD28, IL9R, and TLR8 have not been previously associated with disease, although another SNP in IL4R has previously been associated with RSV bronchiolitis [4]. For the IL15 SNP, a haplotype—but not a single SNP—interaction, has been found in association with asthma [32]. For the other SNPs, associations have been found with various diseases, including the severity of fibrosis in hepatitis C virus infection and susceptibility to stroke [33, 34]. Interestingly, this category of SNPs comprises most allergic asthma genes, although the associations found are less strong than those found in the first category of SNPs. Using this larger cohort of children, we could not confirm our previously described association with an SNP in IL4 [4].

The last category of SNPs demonstrates association at the genotype level only. These SNPs were present in the genes IL10, IL4R, TNF, NCF2, and IL17. In all 5 genes, it appears that the heterozygous genotype has a protective effect on RSV disease, as compared with the major and minor homozygous genotype. Interestingly, a heterozygous advantage is seen for SNPs in 3 cytokine genes and 1 cytokine receptor gene—IL10 (thereby confirming our finding elsewhere [8]), TNF, IL17, and IL4R—which are all involved in determining the extent of inflammation, whereas tumor necrosis factor (TNF)–α and IL-17 are inducers of inflammation. IL-10 is an inhibitor of Th2 responses and inflammation [35], and IL-4R is involved in Th2 responses and associated pathological lesions. The apparent heterozygous advantage is not easy to interpret. First, it could be related to the level of local cytokine production or response, which may determine the balance between clearing the virus and the extent of inflammation that occurs. Indeed, the IL4R SNP has been shown to affect the levels of secreted IL-4R that may affect IL-4 responses [36], and the TNF SNP has been associated with levels of TNF-α production [37]. Interestingly, the latter SNP has also been associated with development of asthma in the Japanese population [38]. Second, cell type–specific regulation of expression may be involved in the heterozygous advantage. Because the TNF, IL10, and IL4R SNPs are promoter SNPs, and because the IL17 SNP is present in the 3′ untranslated region, they may all affect gene expression. IL-10 and TNF-α can be produced both by monocytes/macrophages and by T cells, and it is therefore tempting to speculate that regulation of expression of these cytokines differs in these 2 cell types. The TNF SNP has previously been shown to affect binding of the transcription factor Octamer-binding transcription factor–1 [39]. Tissue-specific effects on expression have previously been described for a variable number of tandem repeats in the INS gene that affects expression in the thymus and in the pancreas in an opposite manner [40]. Third, the results could be caused by linkage. The effects of the TNF SNP could, for instance, result from linkage of the TNF region with the HLA region, for which a heterozygous advantage has been shown [41]. The NCF2 SNP results in an amino acid change in this component of the NADPH oxidase, and, to our knowledge, it has never been associated with any disease. Only the IL10 SNP demonstrates a stronger association with RSV bronchiolitis (P < .01). Thus, although we have not established the mechanism, it is clear that heterozygosity for these alleles is associated with reduced susceptibility to RSV infection.

From this work, it is clear that RSV bronchiolitis is a genetically complex disease influenced by many host genes—in particular, by innate immune genes. One of the limitations of genetic association studies is that, for all associations found, it is unclear whether the associated SNPs are causative or whether associations are found because of linkage of selected SNPs to other causative variants. In addition, our list of associated SNPs might contain some false-positive findings. Further genetic analysis, haplotype determination, and functional studies are therefore needed to elucidate the complex pathobiological aspects and genetic nature of RSV disease.

In conclusion, our data show that genetic susceptibility to RSV bronchiolitis is a complex trait. Association of several SNPs in allergic asthma genes may support a model in which alleles that confer susceptibility to allergic asthma also confer susceptibility to RSV infection, although the associations with allergic asthma genes were clearly not the strongest in our study. The genes that demonstrate the strongest association with RSV bronchiolitis are involved in innate immunity, indicating that this process may play a decisive role in determining disease susceptibility and suggesting that early responses to the virus may not only lead to viral clearance but may also be involved in the development of excessive pathology and disease.

Acknowledgments

We thank S. Imholz, C. Strien, and F. van der Horst, for expert technical assistance. We also thank C. Lindemans and J. Heidema, for inclusion of patients, and H. van Loveren, for critically reading the manuscript.

Appendix

Statistical Analysis

With our data, we can perform a case-control analysis involving the Dutch cases and controls and a transmission disequilibrium test (TDT)–like analysis involving all case-parent trios. The case-control analysis yields relative risks for genotypes, comparing heterozygote and homozygote mutations with the homozygote wild types, without any assumption regarding the disease model or relative risks for alleles under the assumption of an additive effect of the haplotypes on the ln(relative risk) scale.

To explain the procedure, we first consider a 1-dimensional parameter, such as the ln(relative risk for the allele). Let β^1 and β^2 be the estimated coefficients in 2 different analyses with respective SEs s1 and s2. The z statistics for testing β = 0 in either analysis are given by Z1 = β^1/s1 and Z2 = β^2/s2respectively. Let ρ be the estimated correlation coefficient between the 2 estimates. Then, the SE of the difference Δ = β^1 − β^2 is given by Formula.

This can be used to test the null hypothesis of no systematic difference (β1 = β2) between the 2 analyses. Under the assumption of no systematic difference, a weighted mean β^com = wβ^1+(1 − w)β^2 can be computed with SE Formula.

The optimal weight is given by Graphic. Using these optimal weights, we obtain the z statistic for a combined test Zcom = β^com/scom.

Some insight into the procedure can be obtained if the 2 analyses give answers with approximately the same precision: s = s2 = s, which happens to be the case in our data. In that case, Graphic with SE Formula and Formula.

In our data, ρ≈0.5. The implication is that if the 2 analyses are both borderline significant (Z1 = Z2 = 1.96; P1 = P2 = .05), the combined test has Zcom = 2.61 with Pcom =.009; therefore, the combined analysis was performed on all SNPs for which P < .1 in 1 of the 2 individual tests.

Obtaining the Correlation Between the Estimates

Consider 2 overlapping data sets, DS1 and DS2 , with n1 and n2 being independent observations. In each data set, we can fit 2 a multivariate statistical model with the multidimensional parameters θ = (α1 ,β) and θ2 = (α2 ,β), respectively. The 2 models have the k0-dimensional β parameter in common and nonshared parameters α1 and α2 of dimension k1 and k2 , respectively. When fitting these models by maximum likelihood, we obtain the estimated parameters θ^1 = (α^1,β^1) and (α^2,β^2), the Fisher information matrices I1 and I2, and the score matrices U1 and U2, where, generally, Graphic and Graphic are the derivative of the log-likelihood contribution of individual i with respect to parameter θj.

The theory of maximum likelihood estimation holds that the asymptotic covariance of the estimator is given by Graphic. The robust sandwich estimator, which is also valid if the model is misspecified, is given by Graphic.

Because of the overlap, the estimated parameters θ^1 = (α^1,β^1) and (α^2,β^2) are dependent. Their covariance matrix can be estimated by covGraphic by use of only the rows of U1 and U2 that correspond to the overlapping observations. We used this estimator for the covariance matrix cov(θ^1θ^2), and we used the theoretical one for the separate covariance matrices cov(θ^1) and (θ^2), to stay closer to the standard output for each analysis separately.

In our genetic problem, the first analysis is the logistic regression performed on the case-control data with genotype as the explanatory variable (either taken linearly as the number of mutations within a SNP [df = 1] or categorically [df = 2]), and the second analysis is the matched case-control analysis comparing cases with pseudo controls, with separate strata used for each case-parent trio. The score and the information matrix for these models are well known. We do not provide the details here. From the estimated covariance matrix cov(θ^1) and (θ^2), we can obtain the covariance matrix of the common part cov(β^1,β^2). Together with the covariance matrices cov(β^1) and cov(β^2, we now have the tools to obtain the optimal combination.

Combined Estimate

Let (β^1) and cov(β^2 be 2 correlated estimates of the same k-dimensional parameter vector β with covariance matrices cov(β^1) = C1, cov(β^2) = C2 and (β^1,β^2) = C12. Let Graphic then cov(Δ) = C1 + C2C12 − − C21 with Graphic. The hypothesis of no systematic difference can be tested by Hotelling's Graphic.

The most efficient estimate of β is given by Formula with covariance matrix Formula.

The optimal combined test for β = 0 is based on another application of Hotelling T2—namely, Graphic.

Simulations (to be reported elsewhere) show that this approach loses very little information when compared with using the full statistical model.

Footnotes

  • Potential conflicts of interest: none reported.

  • Financial support: Dutch Asthma Foundation (grants 32.96.08 and 32.03.22).

  • Received March 19, 2007.
  • Accepted April 11, 2007.

References

| Table of Contents