Skip Navigation

Identification of Networks of Sexually Transmitted Infection: A Molecular, Geographic, and Social Network Analysis

  1. John L. Wylie1,2,
  2. Teresa Cabral1 and
  3. Ann M. Jolly3,4
  1. 1Department of Medical Microbiology, University of Manitoba, and
  2. 2Cadham Provincial Laboratory, Manitoba Health, Winnipeg, Manitoba, and
  3. 3Centre for Infectious Disease Prevention and Control, Population and Public Health Branch, Health Canada, and
  4. 4Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada
  1. Reprints or correspondence: John L. Wylie, Cadham Provincial Laboratory, PO Box 8450, Winnipeg, Manitoba, Canada R3C 3Y1 (JWylie{at}gov.mb.ca)

Abstract

BackgroundDespite widespread efforts to control it, Chlamydia trachomatis remains the most frequently diagnosed bacterial sexually transmitted infection (STI). Analysis of sexual networks has been proposed as a novel tool for control of and research into STI. In the present study, we combine molecular genotype data, analysis of geographic clusters, and sociodemographic descriptors to facilitate analysis of large sexual networks

MethodsIndividual chlamydia genotypes found in Manitoba, Canada, were analyzed to identify geographic clusters, and the identified clusters were further characterized by statistical analysis of sociodemographic variables

ResultsA total of 10 geographic clusters of chlamydia-genotype infection were identified. Clusters in Winnipeg showed no or little geographic overlap and could be further differentiated on the basis of the sociodemographic characteristics of the individuals within a cluster. Several clusters in northern Manitoba overlapped geographically but, nonetheless, could be differentiated on the basis of the sociodemographic characteristics of the infected individuals

ConclusionsOn the basis of results of the combined analyses, each geographic cluster appeared to represent a relatively distinct transmission network within the larger sexual network. The geographic analysis of the molecular data provided a basis for establishment of potential epidemiological connections between small groups of unlinked individuals. Analytic approaches of the type described here would help to decipher the patterns that exist within large social network data sets and would be applicable to many types of infectious agents

Sexually transmitted infections (STIs) are behavior-based diseases; their progression through a population does not occur in a uniform and random fashion but, instead, is a function of intimate and complex social behaviors between 2 or more people. Despite the role that interactive behaviors play in infectious-disease transmission, the analytic unit of most epidemiological research into STIs is the individual. This approach would not be expected to be sufficient to fully address the role played by group interactions and disease transmission

In contrast to an individual-based approach, social network analysis is a technique used to analyze groups, or networks, of people and their interactions [1, 2]. This approach has provided additional explanatory and predictive power for the understanding of the transmission patterns of both bloodborne and sexually transmitted diseases [35]. The technique focuses on the interactions between individuals within a group and on how those interactive behaviors affect an individual’s risk of acquiring or transmitting an infectious agent

Our past social network research used routinely collected STI contact–tracing data to construct sexual-contact networks [6, 7]. Although we have used molecular genotype data to demonstrate that networks constructed from contact-tracing data do reflect the transmission routes of individual chlamydia genotypes [7], the many small components—that is, those containing only 2 or 3 individuals—that result from contact-tracing data can present an analytic challenge (the term “component” is a social network term used to designate a group of persons with known direct or indirect sexual links to each other [6]). Clearly, for transmission to have occurred, many of these dyads and triads must be connected either to each other or to other, larger components, through undisclosed or unidentified links. In the attempt to identify large-scale STI-transmission networks and the network-specific risk behaviors associated with transmission, the information from these small components can be difficult to analyze, given our lack of knowledge regarding other, larger components to which they may be linked

One approach is to employ additional, independent techniques, such as molecular genotyping [8, 9], to establish potential epidemiological links between unconnected components. The underlying assumption behind genotyping is that individuals infected by the same strain of an infectious agent are more likely to have an epidemiological link to each other than to individuals infected by a different strain. For some infectious agents, such as chlamydia, the difficulty with this assumption is that the number of distinct genotypes is few and the various genotypes present can persist for long periods of time within a given geographic area. Therefore, in this situation, other analytic approaches, in addition to molecular genotyping, are necessary to help identify which individuals may be linked to each other

In the present study, we used a geographic approach to supplement genotypic analyses. We hypothesized that genotype-specific geographic clusters of disease would represent distinct transmission networks and could be used as the basis for the linking of small components, and we conducted a geographic analysis of molecular genotype data for Chlamydia trachomatis in Manitoba, Canada, and used sociodemographic data to show that these clusters do represent relatively homogenous groups of people, thus allowing a large sexual network to be divided into smaller subunits

Materials and Methods

Sources of specimens and data The results presented here are part of an ongoing analysis of sexual network data from Manitoba. The source of the contact-tracing data, their aggregation into a sexual network by use of the software program PAJEK [10], and both the collection of clinical specimens [6] and the identification of chlamydia genotypes have been described elsewhere [7]. The specimens and data associated with this study were collected during 7 November 1997–12 May 1998. Ethics approval for the study was obtained from the Health Research Ethics Board at the University of Manitoba

Geographic analysis The geographic parameter for this study was the province of Manitoba, divided into 44 geographic units (figure 1A ). Except in the case of Winnipeg (the provincial capital), the geographic units were based on the boundaries of local regional health authorities (RHAs); although Winnipeg is included within a single RHA, we subdivided it into 35 geographic areas based on the forward-sortation areas developed for the postal system. This approach is appropriate in our geographic setting, given the large percentage of the provincial population that resides in Winnipeg, and it is likely that several independent transmission networks would exist within the city limits. The latitude and longitude chosen for each of the 44 geographic units represent the approximate geographic center of each of them. Outside Winnipeg, the residence used for each patient record was aggregated to the RHA level; within Winnipeg, it was aggregated to the appropriate forward-sortation area

Figure 1

A Depiction of 44 geographic areas used for SaTScan analysis. The 9 regional health authorities outside Winnipeg—Burntwood (BRW), Norman (NRM), North Eastman (NEM), Parkland (PLD), Interlake (INL), Brandon (BRD), Assiniboine (ASB), Central (CTL), and South Eastman (SEM)—and the 35 forward-sortation areas within Winnipeg, which are designated by their 3-character alphanumeric codes, are identified. B SaTScan-analysis results from table 1. The key identifies the chlamydia genotypes, and the no. of symbols within a given regional health authority or forward-sortation area indicate the no. of cases of chlamydia within it. Only cases associated with clusters are shown

Geographic analysis was conducted by use of SaTScan software [11]. The underlying statistical methods for this software have been published by its developers [12, 13]. For geographic analysis, SaTScan uses 2 models—Bernoulli and Poisson—both of which are approximations for each other when geographic (as opposed to geographic-temporal) analysis is conducted. The advantage of Poisson is that it can adjust for covariates (as was done in the present study; see below) (SaTScan user guide). The SaTScan settings used in this analysis were as follows: type of analysis—spatial; probability model—Poisson; coordinates—latitude and longitude; maximum size of spatial cluster—50%. The last of these settings governs the maximum size of identified geographic clusters: a setting of 50% indicates that clusters could contain, at most, 50% of the provincial population; for comparison, sensitivity analyses using maximum geographic-cluster–size settings of 25% and 10% were also conducted

In addition to population density, SaTScan controls for any number of additional covariates. In this analysis, age was included as a covariate; 5 age groups were used: <15 years old, 15–19 years old, 20–24 years old, 25–29 years old, and >30 years old. The age categories chosen focused on teens and young adults, because there is a greater prevalence of STI transmission within these age groups

Statistical analysis Differences between continuous, nonnormally distributed variables were evaluated by Kruskal-Wallis tests performed by use of JMP In software (SAS Institute). Categorical variables were analyzed either by the χ2 test in Epi Info (version 6.04d; Centers for Disease Control and Prevention) or, if an expected value within a contingency table was <5, by Fisher’s exact test

Results

Geographic analysis Using SaTScan, we began the initial analytic approach by undertaking a geographic analysis to identify statistically significant genotype-specific geographic clusters for each of the individual chlamydia genotypes that we had identified within the province of Manitoba. Genotype data were available for 297 specimens that originally had been genotyped for a previous study [7] and that represent all of the components from which ⩾2 specimens were available, and the location and age data necessary to allow geographic analysis were available for 261 of these 297 specimens; within this subset of 261 specimens, the genotypes represented in the data set (and the respective number of cases) were as follows: Ba2 (9), D (44), D1 (36), E (64), F (34), G (3), G1 (10), I/H (8), J (32), and K1 (21)

The results of the cluster analysis are shown in table 1 and figure 1B . Initially, we conducted a sensitivity analysis based on alteration of the maximum geographic-cluster size; we chose 3 geographic-cluster–size settings: 10%, 25%, and 50%, which constrain the program to look for clusters containing between 0% and an upper limit of 10%, 25%, and 50%, respectively, of the total provincial population. Although a high percentage is recommended as a way to ensure that the program will look for both small- and large-size geographic clusters, we conducted the comparison because we wanted to determine the extent to which our results would change depending on the size setting chosen

Table 1

Results of SaTScan geographic-cluster analysis

For all size settings, genotype-specific clusters were focused within 2 areas of the province—(1) northern and central Manitoba (the RHAs of Burntwood, Norman, Parkland, Interlake, and North Eastman) and (2) different parts of Winnipeg. At the recommended, 50% size setting, 2 genotypes (D and E) were each present in 2 clusters—northern Manitoba and parts of Winnipeg, 3 genotypes (K1, G1, and Ba2) were clustered within Winnipeg only, and 3 genotypes (D1, F, and G) were clustered within northern Manitoba only. A genotype-J cluster was present in most of northern and central Manitoba as well as in northern Winnipeg

In general, the sensitivity analysis showed consistent results for each of the size settings: for genotypes Ba2, D, D1, F, G, and G1 and cluster 2 of genotype E, identical clusters were seen at all 3 size settings, and each had similar P values. At the 50% and 25% size settings, cluster 1 of genotype E encompassed 5 RHAs in northern and central Manitoba; at the 10% size setting, however, it split into 2 clusters (table 1). The larger cluster, encompassing 5 RHAs, better reflects the underlying epidemiology of STI in northern Manitoba, because that region’s rural residents frequently travel—and have sexual contact—throughout the area [6]

Similarly, the larger cluster identified for genotype-K1 cases was chosen for analysis because it too appeared to be a better reflection of reality; the large genotype-K1 cluster identified at the 50% size setting encompassed most of the central, southern, and western parts of Winnipeg: of the 11 components containing genotype-K1 cases (for explanation of the use of component data in the present study, see the “Network analysis” subsection below), 5 showed epidemiological links between either the southern or the western parts of Winnipeg and the central, core area of the city (at the 10% size setting, a cluster was identified only in this latter, core area). In light of these demonstrated links, the large cluster identified at the 50% size setting appears to more accurately reflect the underlying epidemiology of this genotype

Conversely, the genotype-J cluster identified at the 50% size setting is unlikely to reflect a real cluster; this cluster encompassed parts of northern and central Manitoba and the northern parts of Winnipeg and was not significant at the 25% and 10% size settings. The geographic areas that form this cluster are a function of several genotype-J cases being located in RHAs near the northern part of Winnipeg; however, an examination of the components within our data set showed no epidemiological links between Winnipeg genotype-J cases and those in central Manitoba, in contrast to the type of links found for the genotype-K1 cluster. Therefore, we chose to conduct no further analysis of the genotype-J cluster and excluded it from demographic analysis of the cases and of the network

The finding of different geographic clusters of the various genotypes suggests that the latter were circulating among different groups of people, but it also raises the following questions: (1) Is each of the genotypes found in northern Manitoba circulating within the same general demographic group in this area?. (2) Do the clusters of genotypes D and E in northern Manitoba and Winnipeg represent the same, highly mobile, population?. (3) Do other demographic or behavioral data further differentiate the geographically distinct clusters in Winnipeg?. To address these questions, we drew on additional case and network data from our data set

Case analysis The demographic characteristics of the cases associated with each genotype cluster were analyzed first. Two demographic variables—age and aboriginal status (characterized as “aboriginal” and “nonaboriginal”)—were analyzed; first, the case characteristics associated with the 5 genotype clusters in northern Manitoba were compared; second, the case characteristics of the 5 genotype clusters in Winnipeg were compared; last, the case characteristics of the genotype-D and -E clusters in northern Manitoba were compared with those of the corresponding clusters in Winnipeg. The results of these comparisons are shown in table 2

Table 2

Comparison of demographic and behavioral data for cases, in northern Manitoba (NM) and Winnipeg (WPG), associated with clusters shown in table 1

In northern Manitoba, significant overall differences were found for both median age (P=.0083) and percentage of aboriginal cases (P=.0095). Examination of the median and mean&amp;rank scores from the analysis of age indicated that genotype-D cases represented the youngest individuals, whereas the genotype-G cases were older than the cases in the other geographic clusters. The percentage of aboriginal cases was lowest in the genotype-E cluster

In Winnipeg, significant overall differences were identified for both median age (P=.0466) and percentage of aboriginal cases (P=.0002). Cases in the genotype-D, -Ba2 , and -G1 geographic clusters in Winnipeg were younger than those in the genotype-E and -K1 geographic clusters. Genotype-D cases were largely aboriginal, whereas genotype-G1, -K1, and -Ba2 cases were nonaboriginal (genotype-E cases were mostly nonaboriginal). Finally, genotype-D cases in northern Manitoba were not significantly different from those in Winnipeg, with respect to either age or percentage of aboriginal cases; similarly, genotype-E cases in Winnipeg did not differ from those in northern Manitoba, although the difference in percentage of aboriginal cases did approach significance (P=.0634)

Network analysis The data set used in the present study had been collected as part of a sexual network analysis. The availability of these network data allowed us to extend our statistical analysis beyond the level of individual cases, to include other individuals epidemiologically linked, by either direct or indirect sexual contact, to these cases. To include network data, we identified all of the components in which the original cases were located. These components varied in size, from 2 individuals (i.e., the original case and 1 contact) to 41 individuals

The results of the expanded analysis of all component members—that is, cases and contacts—are shown in table 3. In the clusters in northern Manitoba, the differences in age remained significant (P=.0095) in the expanded data set, whereas the differences in percentage of aboriginal individuals did not (P=.57): in the case of age,&amp;rank scores indicated that genotype-D and -G individuals continued to be primarily in younger and older age groups, respectively, whereas genotype-E individuals now also had a low mean score for age, which was similar to that for genotype-D individuals; in the case of ethnicity, the absence of a significant P value for the expanded data set was associated with a general decrease in the percentage of aboriginal genotype-D, -D1, -G, and -F individuals, to levels more closely approaching that for genotype E (in the cases-only analysis [table 2], the genotype-E data produce a significant P value, because removing them from that analysis changes the P value from .0095 to .5)

Table 3

Comparison of demographic and behavioral data for cases and other epidemiologically linked cases and/or contacts, in northern Manitoba (NM) and Winnipeg (WPG), associated with the clusters shown in table 1

For the Winnipeg clusters, both the analysis of age and the analysis of percentage of aboriginal individuals continued to show significant differences in the expanded data set, which were similar to those seen in the cases-only data set (P = .0407 and .0003, respectively): with respect to both median age and percentage of aboriginal individuals, the relative&amp;ranks in the cases-only data set (table 2) remained the same in the expanded data set (table 3), although the percentage of aboriginal individuals tended to be lower in the latter (with the exception of genotype K1, which showed an increase). Finally, in contrast to what was seen in the cases-only data set, the expanded data set showed that genotype-E individuals in Winnipeg were less likely to be aboriginal than were those in northern Manitoba; this trend was also seen in genotype-D individuals, although the difference in their median age approached significance

Discussion

The introduction to this report outlined the analytic challenge that is presented by the large number of small, disconnected components that are typically found within a large network. We have conducted a geographic analysis of molecular genotype data for chlamydia, an analysis supplemented with demographic data at the individual and sexual network level, to link many of these separate components into population subgroups within the larger sexual network

Many of the genotype clusters were clearly differentiated by their geographic patterns (e.g., for northern Manitoba, the genotype-E cluster encompassed 5 RHAs, whereas the genotype-D, -D1, -F, and -G clusters encompassed only 1 each). Frequently, the clusters were further differentiated by the demographic characteristics of the individuals with these genotypes. When demographic differences were observed, the general trend was for them to be most pronounced at the cases-only level. This pattern likely reflects the extensive interconnections that are present within a large sexual network and that ensure frequent bridging events between different demographic groups (for a further illustration of this concept, see the study by Wylie and Jolly [6]). As an analysis extends beyond the original cases to epidemiologically linked individuals, it is more likely that the demographic data will begin to merge toward a common value reflecting the average demographic profile of the overall population at risk. At the network level, however, it generally was clear that the demographic characteristics of epidemiologically linked cases and contacts continued to reflect primarily those seen at the cases-only level

When the genotype-D and -E clusters in northern Manitoba were compared with those in Winnipeg, the demographic data did not confirm that these clusters represent distinct transmission networks (e.g., both the genotype-D clusters in northern Manitoba and those in Winnipeg consisted primarily of young aboriginal cases). In addition, examination of the epidemiological links within components revealed sexual contact between individuals in northern Manitoba and individuals in Winnipeg (e.g., 5 of the 18 genotype-E components contained individuals from northern Manitoba and individuals from Winnipeg). These data suggest that the 2 genotype-D clusters may represent 1 cluster connected geographically by groups of highly mobile individuals, and the same scenario may apply to the genotype-E clusters. This is a limitation of geographic analyses that consider only adjacent areas as being part of a cluster. In some instances, geographically disconnected areas may in fact be linked by the social-behavioral characteristics of the population; therefore, it is necessary that the validity of the predicted geographic clusters be confirmed within the context of local epidemiological patterns

Another limitation of the approach used in the present study is the circular nature of the SaTScan window; SaTScan identifies clusters by imposing circular windows on maps and by allowing the size of these windows to vary between zero and a preset upper limit. Although this approach may work well on maps that demarcate relatively large geographic units (such as those used in the present study), it may not work as well on a smaller scale, where neighborhood-level geographic barriers such as rivers or train tracks could create noncircular interaction patterns

Furthermore, most chlamydia transmission in Manitoba appears to be largely endemic [6]. This characteristic may not necessarily represent the dynamics of STI transmission in other jurisdictions—where, for example, a contact of an original case may frequently reside outside the geographic area being investigated. SaTScan would only be effective for characterization of endemic-transmission clusters within the geographic area being studied

In a previous report, we presented empirical evidence that visually demonstrates the existence of large components of the type expected for STI core groups, and we have proposed that sexual network analysis is a useful tool for the potential identification of these groups, so that they can be targeted for STI-control programs [6]. The results of the use of molecular genotype data combined with the identification of clusters have modified that proposal somewhat: if clusters of higher-than-expected numbers of infections are a result of heightened transmission that would be expected to be characteristic of core-group activity, then it appears that many small components identified within a network could be part of a core group

The elucidation of the relationship between clusters, components, and core groups will require additional research. In the present study, very limited demographic data were available to supplement the molecular data. A more complete understanding of cluster formation would be provided if data on the social behaviors of the individuals within a cluster could be obtained, to address the extent to which these behaviors are homogenous within a cluster but heterogeneous when compared with those of other clusters and of the general population. This information would provide a more useful set of parameters for program planners seeking to devise targeted STI-control programs

A second avenue of investigation is the potential for cluster-based prioritization of contact tracing. The analysis in the present study was retrospective, using data from 1998, and, as such, was meant to illustrate an approach in which molecular, geographic, and social data are combined, to facilitate the identification of distinct transmission networks within a jurisdiction—rather than to generate results relevant to the current STI situation in Manitoba. However, SaTScan can also be used for prospective surveillance and could assist in STI control. Targeted approaches to contact tracing that are based on geography have been proposed and successfully tested [14]. In our setting, cluster-based contact tracing would largely overlap a geographic approach, because many of the clusters in the present study occurred in areas already known to have high rates of infection [15, 16]. However, the cluster-based approach, especially when it incorporates genotype data, as is the case in the present study, could provide a more refined approach. At least 1 identified cluster (genotype K1 in Winnipeg) was located primarily in suburban areas and could have been missed by a solely geographic approach focused on the high STI rates in Winnipeg’s inner city

Finally, a simple—yet fundamental—issue that must be addressed is the temporal dynamics of clusters. The data that we used were gathered during 1998; would the same analysis, if performed today, identify the same general distribution of genotype clusters but contain a different cohort of individuals, or would a completely different set of clusters be identified?

Since it was first proposed that molecular data should be combined with social-/sexual network data [8, 9], several groups have applied this approach to bacterial STIs [7, 17, 18]. In the present study, we have used a geographic-clustering technique to identify areas with unusually high rates of specific chlamydia genotypes, thereby providing a basis for linking heretofore unconnected components within a sexual network. This approach is particularly useful in the study of infections such as chlamydia, for which only a relatively small number of genotypes exist and in which individuals infected by a common genotype cannot necessarily be assumed to have a recent epidemiological link to all others infected by that genotype; in this case, the identification of geographic clusters serves as an additional tool that can be used to decipher the patterns that exist within social and molecular data

Footnotes

  • Presented in part: International Society for Sexually Transmitted Diseases Research Congress, Ottawa, Ontario, Canada, 27–30 July 2003

    Financial support: Manitoba Health Research Council; Manitoba Medical Services Foundation

  • Received July 21, 2004.
  • Accepted September 8, 2004.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
| Table of Contents