BackgroundThe proportion of cases of tuberculosis due to recent infection can be estimated in long-term population-based studies using molecular techniques. Here, we present what is, to our knowledge, the first such study in an area with high human immunodeficiency virus (HIV) prevalence
MethodsAll patients with tuberculosis in Karonga District, Malawi, were interviewed. Isolates were genotyped using restriction-fragment–length polymorphism (RFLP) patterns. Strains were considered to be “clustered” if at least 1 other patient had an isolate with an identical pattern
ResultsRFLP results were available from 83% of culture-positive patients from late 1995 to early 2003. When strains with <5 bands were excluded, 72% (682/948) were clustered. Maximum clustering was reached using a 4-year window, with an estimated two-thirds of cases due to recent transmission. The proportion clustered decreased with age and varied by area of residence. In older adults, clustering was less common in men and more common in patients who were HIV positive (adjusted odds ratio, 5.1 [95% confidence interval, 2.1–12.6])
ConclusionsThe proportion clustered found in the present study was among the highest in the world, suggesting high rates of recent transmission. The association with HIV infection in older adults may suggest that HIV has a greater impact on disease caused by recent transmission than on that caused by reactivation
DNA fingerprinting of Mycobacterium tuberculosis has been used in conjunction with conventional epidemiological approaches to elucidate transmission patterns in different settings. M. tuberculosis strains with identical DNA fingerprints are said to be “clustered,” and the proportion of clustering in a population is thought to reflect the amount of recent transmission [1 –4]. Using this argument, researchers have estimated the proportion of tuberculosis attributable to recent transmission [1] and how it changes over time [5], as well as factors associated with clustering and, hence, with recent transmission [1, 6]
Most information has been gained from studies that attempt to include all tuberculosis cases in a defined population over several years. Including all cases is important, since incomplete sampling will lead to a failure to identify clusters and, therefore, to an underestimation of clustering and of the importance of recent transmission [7]. Short studies will also miss clusters [4]. Several population-based studies spanning several years now exist [5, 8 –11], but none have been conducted in a general population setting in an area of Africa with a high HIV burden. As part of the Karonga Prevention Study in northern Malawi, where the HIV infection prevalence in adults is currently ∼13% [12], we have had the opportunity to conduct the first long-term population-based molecular epidemiological study of tuberculosis in an area with a high prevalence of HIV infection
In Karonga District, Malawi, suspected tuberculosis cases were identified in clinics and the district hospital by screening patients who had had a cough for at least 3 weeks and by examining patients with enlarged lymph nodes. Sputum samples were taken, and, since 1997, aspiration and culture of lymph nodes has been performed. Tuberculosis treatment is in accordance with Malawi National TB Control Programme guidelines. Patients with tuberculosis were tested for HIV, after counseling and if consent was obtained [13, 14]. Permission for the study was received from the Malawi National Health Sciences Research Committee and from the ethics committee of the London School of Hygiene and Tropical Medicine
Sputum smear microscopy and bacteriological culture on Lowenstein-Jensen medium were performed in the project laboratory. Cultures that morphologically resembled M. tuberculosis were sent to the Health Protection Agency National Mycobacterium Reference Laboratory in London, England, for species identification and drug sensitivity testing. All cultures from patients with tuberculosis in Karonga District since late 1995 have been stored for DNA fingerprinting. Since 1997, all patients with tuberculosis have been interviewed and asked about their area of residence during the previous 5 years for any period ⩾3 months, as well as about known tuberculosis contacts
Isolates cultured from specimens collected between late 1995 and early 2003 were included in this analysis. Specimens were fingerprinted using standard methods based on restriction-fragment–length polymorphism (RFLP) patterns of IS6110 [15]. They were compared using computer-assisted (GelCompar version 4.1; Applied Maths) visual comparison. The possibility of laboratory error was considered when identical fingerprints were obtained from specimens from different patients processed on the same day [14]. These specimens were excluded if there was no other laboratory evidence of tuberculosis (isolated positive cultures), if they were the only 2 examples of this RFLP pattern, or if the patients had other isolates with different patterns
Some patients had >1 specimen available. To define whether an RFLP pattern was unique or clustered, patients were included only once unless they had >1 fingerprint pattern (after likely errors were excluded). Strains were classified as clustered if at least 2 patients had identical RFLP patterns. This was defined both overall and by time period. After clustering was defined, patients were only included once, for their first episode of illness for which an RFLP result was available within the time period of the study
Recent transmission was estimated using the “n-1” method [1] ([number of patients in clusters − number of clusters]/total number of patients), to allow for an index case in each cluster. Because of the long duration of the study, this statistic was reestimated to investigate clustering within given time periods. Each patient was considered to have a clustered strain in a given time period if another patient had previously had the same strain within that time period. This “retrospective” proportion clustered is equivalent to the n-1 estimate for that period (since the first patient with the strain in any cluster is not clustered)
Risk factors for clustering were examined using overall and retrospective clustering in different time periods. If 1 index case per cluster is assumed, then the larger the cluster, the higher the proportion of patients with recently acquired tuberculosis. Analyses of risk factors for clustering were therefore repeated after smaller clusters (2–4 patients and 2–9 patients) were excluded, to approximate an analysis of risk factors for recent transmission
Over the period of the study, 1194 specimens from 1044 patients (84% of 1248 culture-positive patients seen during this period) were fingerprinted. The remaining isolates were contaminated, not viable, or missing. Twenty-five fingerprints were likely to have been laboratory errors and were excluded. After multiple isolates per person were excluded, there were 1029 patients with 406 different RFLP patterns. All but 73 had pulmonary tuberculosis. The number of bands in the strains from all 1029 patients is shown in figure 1
No. of bands in the restriction-fragment–length polymorphism patterns of Mycobacterium tuberculosis strains from 1029 patients in northern Malawi
The proportion clustered was 73.6% (757/1029) overall, or 71.9% (682/948) after the 81 isolates with <5 bands on the RFLP pattern were excluded. When 1 case per cluster was assumed to be an index case, the proportion of cases apparently due to recent transmission was 60.8% by use of the n-1 formula, or 59.4% after isolates with <5 bands were excluded
Clustering was also examined as clustering with isolates obtained previously, using different time windows. The results for patients with at least 5 bands are summarized in figure 2. The overall proportion clustered with any previous isolates is equivalent to the cumulative n-1 estimate. It increased with time, as expected, as the data set grew, but reached a plateau after 3–4 years. The other lines in figure 2 show the proportion clustered with previous isolates within a given number of years. This proportion increased as the time window lengthened, but the results for the 4- and 5-year time windows were similar to each other and to the overall estimate. Of the total retrospective clustering detected using a 4-year window (recorded from 2000 onward, to ensure that all patients had 4 years of previous data available), 72% was detected using a 1-year window, 88% using a 2-year window, and 96% using a 3-year window. For any given window, the proportion clustered was stable over the period of the study
Proportion of Mycobacterium tuberculosis strains that clustered with previous isolates. Isolates with restriction-fragment–length polymorphism patterns with <5 bands were excluded. The gray line shows the cumulative proportion overall; the other lines show the results when fixed time windows were used
The plateau in the proportion clustered can also be seen by examining the proportion of new strains (unique or the first in a cluster) in each year. This fell sharply, from 70% in 1996 to ∼30% each year since 1999. Of the new strains from 1999 on, 17% went on to form clusters
The cluster size distribution is shown in figure 3. The largest cluster contained 37 patients, and 24 contained at least 10 patients. Risk factors for clustering were examined using both overall clustering and retrospective clustering for different time periods. To avoid incomplete time periods, for retrospective clustering with a 3-year window, data from 1999 onward were used; for a 1-year window, data from 1997 onward were used. Analyses were performed both including and excluding isolates with <5 bands. The results were similar; results from analyses excluding isolates with <5 bands are presented in table 1
Cluster size distribution. Patients with unique Mycobacterium tuberculosis strains are not shown
The proportion clustered decreased with age, from >75% among young adults to <60% among patients >55 years of age, and this decrease was statistically significant (P=.001, test for trend). The decrease was more marked for men than women (P=.1, test for interaction). Women were more likely than men to have clustered strains. This association persisted after age was adjusted for but was stronger among older adults. The proportion clustered was higher among patients who were HIV positive. Again, this was apparent only among the older adults (P=.001, test for interaction). The decrease in clustering with age was seen only among HIV-negative patients: 77% (106/138) of those <45 years of age were clustered, compared with 53% (35/66) of those ⩾45 years of age (P=.001). Among HIV-positive patients, the proportion clustered increased slightly with age: 75% (238/317) of those <45 years of age, compared with 82% (50/61) of those ⩾45 years of age (P=.2)
The proportion clustered varied in different areas of the district, being lower in the far north and far south of the district than in the central areas. There was lower clustering among patients who had lived outside the district during the past 5 years. There was no association with any of the other factors studied: site of tuberculosis, previous tuberculosis, known family or other contacts with tuberculosis, or drug resistance
Because of the interactions with age, the multivariate analysis was conducted for 2 different age groups (table 2). Age 45 years was used as a cutoff, since the proportion clustered appeared to decrease after this age. Among patients <45 years of age, the only factors significantly associated with clustering were area of residence at the onset of illness and during the previous 5 years. There was a weak association with sex and no association with HIV-infection status. Among patients ⩾45 years of age, clustering was more common in women and in patients who were HIV positive. Clustering was uncommon in patients living outside the district, but there was no other association with area of residence or any of the other factors
When the retrospective clustering approach was used, the results were similar. There were significant interactions (P=.001–.009) between age and HIV-infection status and between age and sex when both the 3-year and 1-year windows were used. When the 1-year window (after 1996) was used, the only significant risk factors for clustering in patients <45 years of age were area of residence and previous tuberculosis, with lower clustering in those with previous tuberculosis (odds ratio [OR], 0.34 [95% confidence interval {CI}, 0.17–0.68], after adjusting for area). For patients ⩾45 years of age, the effects of HIV-infection status and sex were similar to those seen with overall clustering, and there was more clustering in patients with previous tuberculosis (OR, 2.2 [95% CI, 0.87–5.6], after adjusting for HIV-infection status and sex). When the 3-year window (after 1998) was used, only area of residence was a significant risk factor for clustering in patients <45 years of age, and only HIV-infection status and sex were significant risk factors in patients ⩾45 years of age, with ORs similar to those found using overall clustering
Risk factors for clustering were also examined after smaller clusters (2–4 patients and 2–9 patients) were excluded. The results were very similar to those overall; associations were found with area of current and previous residence and with age group, and associations with HIV-infection status and sex were found only in patients >45 years of age
In this rural African population, the proportion of strains that were clustered was among the highest recorded in the world. Clustering is related to the proportion of tuberculosis due to recent transmission, and, as expected, the proportion clustered found in our study was much higher than that found in large studies in low-incidence settings in the West [1, 2, 8, 16]. The proportion clustered was similar to that found in 2 very high–incidence settings in South Africa. In a 6-year study in Cape Town, the clustering proportion was 72%, and the n-1 clustering proportion was 58% [11]; in a 1-year study in the South African gold mines, the clustering proportion was 50% [17]. This is, at first, surprising, given that the annual risk of infection with M. tuberculosis in Cape Town is estimated to be 3.5%, compared with 1% in Karonga District [18]. However, it has been shown theoretically that, when the annual risk of infection is constant over time, the clustering proportion will be high: ∼75% for a wide range of infection risks [19]
Measured clustering depends on several factors, including completeness of sampling, immigration, and time period [4]. In the present study, 84% of culture-positive patients were included. Although this is relatively high, some tuberculosis cases will have remained undiagnosed, as in all populations, and the true proportion clustered is likely to have been even higher. The case detection rate is not known, but the screening at peripheral clinics and ongoing case-control studies involving household visits in the district should have increased it. The extent of the underestimation of clustering is limited by the large cluster sizes, since strains in large clusters are likely to be recognized as clustered, even if some patients in the cluster are missed [7]
Our study included a complete district. This will have maximized the chance of observing transmissions occurring within the study area and, therefore, of recognizing clusters. The district has a population of ∼220,000 and measures ∼150 × 20 km. It is bounded to the east by Lake Malawi and to the west by the sparsely populated Nyika Plateau, so population movement occurs only to the north (into Tanzania) and south. The proportion clustered was lower in the far north and especially low in the south, suggesting that more transmissions may be missed in those border areas. The district is largely rural, and the majority of the population are subsistence farmers, so little migration might be expected; it is notable, however, that more than one-third of the patients with tuberculosis had lived outside the district during the previous 5 years for periods of ⩾3 months and that this was associated with reduced clustering, indicating importation of infections that occurred elsewhere
A high proportion of clustered cases could also reflect few introductions of M. tuberculosis into the population in the past and, hence, a lack of diversity of strains. This has been suggested as an explanation for high clustering in Greenland [9]. Karonga District had been quite isolated in the past, before the main road was built in 1980. The 1029 patients had 406 RFLP-defined strains, giving a “diversity” of 39% (406/1029). Diversity in other long-term studies can be calculated similarly, as, for example, 49% in Cape Town [11] and ∼70% in long-term population-based studies in the West [8, 16, 20]
The effect of time period was explored using different time windows. The proportion clustered when time windows were used was higher than the overall n-1 clustering proportion (59%), because the overall figure includes truncated time windows in the earlier years. This suggests that the proportion of tuberculosis due to recent transmission is >65% (defining “recent” as within 5 years and using the 5-year time window). When time windows were used, there was no evidence that the proportion clustered changed during the period of this study
The way clustering increases over time reflects the incubation period of tuberculosis, the rate of change of the RFLP pattern, and population movement. The half-life for changes in RFLP patterns has been estimated to be 2–3.2 years or longer [21 –23] and may be faster for transmission between patients than within patients [14]. Modeling suggests that the time taken to reach a plateau in clustering increases only slightly with increases in the half-life of the RFLP pattern between 2 and 10 years [24]. Interestingly, the patterns of change in clustering over time seen in Karonga were similar to those seen in populations with lower tuberculosis incidences and lower overall clustering. Thus, clustering increased over time to reach a plateau at ∼3–4 years, as in the Netherlands [8], and 72% of the clustering was found within 1 year, as in San Francisco [5]. The consistency in time taken to reach the plateau in different settings suggests that the rate of change in clustering over time reflects the incubation period of tuberculosis and the rate of change of the RFLP pattern rather than local transmission patterns
It was hoped that using retrospective clustering with limited time periods or excluding smaller clusters would give clearer associations between risk factors and clustering. The results were similar in all analyses. The proportion clustered decreased slightly with age. This is expected, since it reflects an increasing proportion of reactivation disease with age and has been found in other studies [8, 9]. However, the trend was seen only in patients who were HIV negative. Similarly, an association between HIV-infection status and clustering was only found in older adults, in whom HIV positivity was associated with a 5-fold increased risk of clustering. HIV-infection status was not known for all patients, but this is unlikely to have influenced the association between HIV-infection status and clustering. No association of clustering with HIV-infection status was found in South Africa, in a general population or in the gold mines [17, 25]. Where associations between clustering and HIV-infection status have been found, nosocomial transmission may have been an important factor [1]. In Karonga, there was no evidence that the HIV-related clustering was due to an outbreak: the 50 HIV-positive older adults with clustered strains were in 35 different clusters
Whether HIV infection is likely to increase clustering depends on its relative ability to increase the incidence of tuberculous disease after past or recent infection, on any influence of HIV infection on risk of M. tuberculosis infection, and on any difference in infectiousness between HIV-infected and -uninfected patients with tuberculosis. The finding of an association between HIV-infection status and clustering may suggest that HIV infection has a greater effect on reinfection than on reactivation disease. The absence of this association in patients <45 years of age may be explained by the very high proportion of disease attributable to recent transmission in the HIV-negative patients in the younger age group
We also found a strong association between clustering and sex in older adults, with women being much more likely than men to have clustered strains. This pattern is not usually found, and some studies have found higher clustering in men [8, 9, 16]. The pattern is likely to depend on the local epidemiological aspects of tuberculosis. We have previously shown in this population that a higher proportion of tuberculosis in women than in men is attributable to transmission from a known contact, so higher clustering might be expected [26]
The present study is, to our knowledge, the largest population-based molecular epidemiological study of tuberculosis yet reported from a high-incidence setting and the first large study in an area with a high prevalence of HIV infection. We have shown that most of the tuberculosis in all age groups is attributable to recent transmission and that HIV infection appears to increase the risk of tuberculosis after recent infection more than that associated with reactivation of past infection, at least in older adults
We thank the Government of the Republic of Malawi for their interest in and support of the project and the National Health Sciences Research Committee of Malawi for permission to publish the article. We thank Emilia Vynnycky for helpful comments on an earlier draft
↵Financial support: Until 1996, the Karonga Prevention Study was funded primarily by LEPRA (The British Leprosy Relief Association) and ILEP (The International Federation of Anti-Leprosy Organizations), with contributions from the World Health Organisation/United Nations Development Programme/World Bank Special Programme for Research and Training in Tropical Diseases. Since 1996, the Wellcome Trust has been the principal funder. J.R.G. was supported in part by the UK Department for International Development and is now funded by the UK Department of Health (Public Health Career Scientist award)
Potential conflicts of interest: none reported
IDSA Members: For your free access to this journal, log in via the IDSA members area.
Open access options for authors visit Oxford Open
This journal enables compliance with the NIH Public Access Policy