We describe the genome sequence of a macrolide-resistant strain (MGAS10394) of serotype M6 group A Streptococcus (GAS). The genome is 1,900,156 bp in length, and 8 prophage-like elements or remnants compose 12.4% of the chromosome. A 8.3-kb prophage remnant encodes the SpeA4 variant of streptococcal pyrogenic exotoxin A. The genome of strain MGAS10394 contains a chimeric genetic element composed of prophage genes and a transposon encoding the mefA gene conferring macrolide resistance. This chimeric element also has a gene encoding a novel surface-exposed protein (designated “R6 protein”), with an LPKTG cell-anchor motif located at the carboxyterminus. Surface expression of this protein was confirmed by flow cytometry. Humans with GAS pharyngitis caused by serotype M6 strains had antibody against the R6 protein present in convalescent, but not acute, serum samples. Our studies add to the theme that GAS prophage-encoded extracellular proteins contribute to host-pathogen interactions in a strain-specific fashion.
Group A Streptococcus (GAS) is a gram-positive bacterial pathogen responsible for a wide range of human infections, including pharyngitis, impetigo, puerperal sepsis, necrotizing fasciitis (“flesh-eating disease”), scarlet fever, and the postinfection sequelae glomerulonephritis and rheumatic fever [1, 2]. The genomes of 4 GAS strains have been characterized, including serotype M1, M3 (2 strains), and M18 [3–6] organisms. Each genome is polylysogenic; that is, each contains 4-6 prophages or prophage-like elements, most of which encode 1 or 2 extracellular secreted proteins thought to enhance fitness or increase virulence. Prophage-related open-reading frames (ORFs) compose a small minority of the total GAS genome, but they are responsible for up to 74% of the variation in gene content among different strains. Hence, prophages and related elements have contributed significantly to the evolution of GAS and to diversification of the genome [7, 8].
Given the critical importance of prophage-like elements in GAS biology, it is possible that sequencing the genomes of additional GAS strains will reveal novel attributes bearing on prophages and their role in hostpathogen interactions. We chose to test this hypothesis by sequencing the genome of a serotype M6 GAS strain. The genome sequence was used to conduct further analysis of serotype M6 strains, resulting in a more detailed understanding of GAS biology.
Bacterial strains. Strain MGAS10394 is a serotype M6 organism cultured from a child in Pittsburgh, Pennsylvania, during a study of the epidemiology of pharyngitis in a private elementary school [9, 10]. This organism is resistant to erythromycin, contains the mefA gene encoding macrolide resistance, and has a chromosomal field-inversion gel-electrophoresis pattern characteristic of the serotype M6 clone causing disease in this population [10]. Strain MGAS10394 has been deposited at the American Type Culture Collection (accession no. BAA-946). Fifty-four additional serotype M6 strains recovered from patients in diverse localities were studied (table 1).
Genome sequencing. The genome of strain MGAS10394 was sequenced by methods used for several other bacteria [11– 13]. Directed sequencing was performed to increase the minimum consensus base quality to Q40 for regions of low sequence quality in the assembled genome. The entire genome was tiled by use of polymerase chain reaction (PCR) after closure, to ascertain the validity of the assembly. ORFs were identified with proprietary software (Integrated Genomics) and entered into the ERGO bioinformatics suite for annotation [14]. The genome sequence has been deposited in the GenBank database (accession no. pending).
Detection of prophage-associated virulence genes in diverse serotype M6 strains. Serotype M6 GAS strains (n=54) obtained from diverse localities and at diverse times (table 1) were screened for the presence of 14 prophage-associated virulence genes by use of PCR [15].
PCR mapping of the speA4 gene region and DNA sequence analysis of the speA gene. PCR was used to determine the presence of the speA gene and flanking genes in 54 strains of GAS recovered from diverse localities and at diverse times (table 1).
Cloning of R6 protein gene segments. The ORF encoding a protein designated as the “R6 protein” recently was identified by GLIMMER2 [16] analysis of a chimeric genetic element discovered during preliminary genome sequencing of strain MGAS10394 [9]. Using primers GAAATAGCACCCATGGAAAAAGAATTATC (forward) and GCTGCTTTAGAACAGAATTCAATTTCTG (reverse), we cloned a region encoding amino acid residues 46–713 of the R6 protein. This construct was designed to encode an 80-kDa truncated R6 protein with a carboxyterminus 6X-His tag. However, purification of the recombinant protein with a nickel-nitrolotriacetic acid (NTA) resin column (Qiagen) yielded 2 major bands on SDS-PAGE gels, corresponding to 70-kDa and 50-kDa recombinant proteins (see below). Aminoterminal sequencing found that each of these proteins started with a methionine residue followed by amino acids that corresponded to distinct internal sequences of the inferred R6 protein. The result suggested that internal promoters within the predicted R6 ORF were mediating expression of the 70-kDa and 50-kDa recombinant proteins. This idea was supported by the presence of 2 distinct putative promoter sequences and ribosome binding sites located immediately upstream of the methionine codons corresponding to the Met residues identified by aminoterminal sequencing (available at: http://www.fruitfly.org/seq_tools/promoter.html).
Northern blot analysis. Northern blot analysis was conducted with total RNA purified from strain MGAS10394 grown to an OD600 of 0.4, 0.6, and 0.8. RNA was extracted by use of a FastRNA kit (Qbiogene). Total RNA (10 µg) was probed with a 326-bp probe made with primers GACCAAGCAATTAAAGATCTTGAAGAAG (forward) and GCTGCTTTAGAACAGAATTCAATTTCTG (reverse), corresponding to nt 1816–2142.
Purification of recombinant R6 protein. All chromatography was conducted by use of an AKTA Explorer fast performance liquid chromatography instrument (Pharmacia). Overexpression of the presumed 80-kDa segment of the R6 protein (corresponding to amino acid residues 46–713, plus the 6XHis tag) was performed by use of Escherichia coli strain pLys DE3 (Novagen) containing plasmid pET-21d-80. Bacterial cell lysates were loaded onto a column (XK-16; Amersham) packed with Ni-NTA resin (Qiagen), washed, and eluted with a linear gradient of 50 mmol/L NaH2PO4, 300 mmol/L NaCl, 250 mmol/L imidazole, and 0.05% CHAPS (pH 8.8). Two major bands corresponding to 50-kDa and 70-kDa recombinant proteins were obtained, rather than the expected 80-kDa recombinant protein. The N-terminal sequences MIDELKKLDSASKQS and MKGLENTQKELEAQK were obtained from the 70-kDa and 50-kDa recombinant products, respectively. These results indicate that inferred internal promoters present in the cloned segment (authors' unpublished observations) were being used in E. coli.
Mouse immunization and Western immunoblot analysis of 70-kDa and 50-kDa recombinant R6 proteins with mouse serum. Mice were immunized subcutaneously with 70-kDa recombinant R6 protein (30 µg), and the resulting serum had specific reactivity with recombinant R6 protein (70-kDa and 50-kDa forms), as assessed by Western immunoblot analysis (data not shown).
Western immunoblot analysis of serum samples obtained from patients with pharyngitis infected with R6-positive serotype M6 strains. Serum samples were obtained from patients with GAS pharyngitis caused by serotype M6 strains containing the gene encoding the R6 protein. Acute serum samples were obtained on presentation, and convalescent serumsamples were obtained ∼3 weeks after treatment. Serum samples were used at a dilution of 1:3000.
Flow-cytometric analysis. Flow cytometry was conducted with a FACScaliber instrument (BD Biosciences), by methods described elsewhere [17], with strain MGAS10394 grown to an OD600 of 0.4.
Overview of the genome sequence of strain MGAS10394 and comparison to other published GAS genome sequences. The sequenced genome of strain MGAS10394 is a circular chromosome of 1,900,156 bp with a guanine plus cytosine (G+C) content of 38.7% (figure 1). This is the largest GAS genome described thus far [3–6]. The G+C content of the genome is essentially identical to those of GAS strains SF370 (38.5%), MGAS8232 (38.6%), MGAS315 (38.7%), and SSI-1 (38.7%). There are 1920 predicted coding sequences, which compose 1667 kb (87.7%) of the genome. The gene context and content of the “core” chromosome (i.e., the part of the genome that does not include prophage-like and obvious insertion elements) are very similar to those described for strains SF370, MGAS8232, and MGAS315 [3–5].
Atlas of the chromosome of serotype M6 strain MGAS10394. Arrowheads in the outermost ring depict the position and orientation of all transposase genes (light blue, clockwise; green, counterclockwise). The middle 3 rings show the position of the 6 RNA operons (black) and openreading frames (orange, clockwise; blue, counterclockwise) identified in the genome. The innermost ring shows the location and name of the prophageelement sequences (red) identified in the genome. MefA, macrolide efflux A; MF, mitogenic factor; Sda, streptodornase α; Sdn, streptodornase; Sla, phospholipase A2; Spe, streptococcal pyrogenic exotoxin.
Potentially mobile genetic elements. Eight regions of the genome contain prophage-like elements or apparent remnants of prophage-like elements (figure 2), which vary in size from 8.3 to 58.8 kb. (For simplicity of presentation, elements that appear to be prophages or prophage remnants usually will be referred to simply as “prophage elements,” with the full understanding of the inherent limitations of this nomenclature.) Similar to other GAS genomes [3–8], the distribution of prophage-element integration sites in the genome of strain MGAS10394 is apparently not random, with 6 of the 8 sites located in the half of the chromosome distal to the origin of replication (figure 2). Prophage element DNA composes 237 kb (12.4%) of the chromosome, which is the same percentage as strain MGAS315 and strain SSI-1 but which exceeds the genome of strains SF370 (7.1%) and MGAS8232 (10.8%).
Schematic showing the group A Streptococcus (GAS) core chromosome, prophage-element insertion sites, and prophage element-encoded virulence factors. The circle represents the GAS chromosome. The prophage elements are indicated with triangles that are color coded to match the GAS source strain. Stacked triangles indicate that the prophage elements are inserted at the same site. The numbers in the triangles represent the clockwise order of prophage elements in the GAS strain. The 6 rRNA operons are shown as black bars on the chromosome. MefA, macrolide efflux A; MF, mitogenic factor; Sda, streptodornase α; Sdn, streptodornase; Sla, phospholipase A2; Spe, streptococcal pyrogenic exotoxin; SSA, streptococcal superantigen.
Prophage elements Φ10394.2 (8.3 kb) and Φ10394.8 (13.3 kb) are very small, relative to bona fide GAS prophages, and, hence, are likely to be prophage remnants. Prophage element Φ10394.8 is closely similar in size and nucleotide sequence to Φ370.4 in strain SF370 and is located at the same integration site. Both elements lack a proven or putative virulence factor. Taken together, these characteristics suggest that Φ10394.8 and Φ370.4 are related by descent. Prophage element Φ10394.2 was not present in the genomes of the 4 GAS strains that have been sequenced [3–6]. This element encodes the SpeA4 variant of streptococcal pyrogenic exotoxin (Spe) A, a protein that differs from SpeA1, SpeA2, and SpeA3 by ∼11% [18]. The presence of the speA4 gene in strain MGAS10394 is consistent with reports that many serotype M6 strains have this allele [18–21]. This prophage element is integrated between genes encoding a sulfurtransferase (SpyM3_0629) and ribosomal protein S1 (SpyM3_0628). In addition, Φ10394.2 contains ORFs that would encode several transposes and a truncated variant (distal half) of SpeI. The Φ10394.2 element also has an apparent pseudogene for SpeH, which contains 14 internal stop codons relative to the wild-type allele.
Prophage element Φ10394.1 (41.2 kb) is inserted close to the origin of replication and encodes an inferred DNase related to streptodornase encoded by Φ315.6 present in serotype M3 strain MGAS315. Φ10394.1 and Φ315.6 are closely related in overall nucleotide sequence (figure 3) but are integrated at different chromosomal sites.
Relationships among group A Streptococcus (GAS) prophage elements. Prophage element sequences present in the 4 sequenced GAS genomes were aligned with CLUSTAL W software, and an unrooted tree was generated with the DRAWTREE application in PHYLIP. All bootstrap values are 1000, unless indicated otherwise at nodes. Proven or putative virulence factors encoded by each prophage element are indicated. The prophage element designations are color coded to match the source strain. MefA, macrolide efflux A; MF, mitogenic factor; Sda, streptodornase α; Sdn, streptodornase; Sla, phospholipase A2; Spe, streptococcal pyrogenic exotoxin; SSA, streptococcal superantigen.
Prophage element Φ10394.3 (35 kb) is inserted at the predicted T12att site, where attP and attB share a region of identity that is 96 bp in length [22]. This location is analogous to the insertion site described for prophages Φ315.2 (serotype M3) and Φ8232.2 (serotype M18). However, each of these 3 prophage elements encodes different proven or putative extracellular virulence factors, undoubtedly reflecting recombinational evolution.
Prophage element Φ10394.4 (58.8 kb) is the largest foreign element found thus far in sequenced GAS genomes. A preliminary description of this prophage element was published recently [9]. This chimeric element contains a gene (mefA) encoding macrolide resistance located in a 7.4-kb transposon, 48.1 kb of prophage genes, and an ORF encoding an inferred protein (R6 protein) with an LPKTG amino acid motif located at the carboxyterminus. LPXTG motifs serve to covalently anchor extracellular proteins to the bacterial cell surface, and many of these proteins are proven virulence factors in GAS and other gram-positive pathogens (see below) [23, 24]. The R6 protein also contains a conventional gram-positive secretion signal sequence located at the aminoterminus, as expected for a potential extracellular protein.
Prophage element Φ10394.5 (31.2 kb) is virtually identical to 3 phages (Φ370.3, Φ315.3, and Φ8232.4) (figure 3) previously described in GAS [3–5], all of which are inserted at the analogous chromosomal sites (figure 2). However, unlike these 3 prophages, Φ10394.5 has contiguous genes encoding SpeC and mitogenic factor (MF) 2, rather than MF3 (Φ370.3 and Φ8232.4) or MF4 (Φ315.3). These findings also support the idea of recombinational evolution of GAS prophages.
Prophage element Φ10394.6 (25.4 kb) encodes a DNase related to streptodornase α (Sda) encoded by Φ8232.5. Φ10394.6 is inserted at a site analogous to where Φ315.5 (encoding SpeA3) is located in the chromosome of strain MGAS315. However, Φ10394.6 and Φ315.5 are not related. Rather, in overall nucleotide sequence, Φ10394.6 is related to Φ8232.5 (figure 3). Prophage element Φ10394.7 (23.8 kb) encodes a DNase related to MF3 encoded by Φ315.3 and is inserted at a unique site in the GAS genome, relative to all other described prophage-like elements.
Diversity of prophage elements in natural populations of serotype M6 GAS from different localities and diseases. Recent studies have revealed that variation in prophage content is a fundamental contributor to genetic diversity among strains of 3 GAS M protein serotypes [4, 5, 7, 8]. However, it is not known whether this is the case for serotype M6 organisms. Inasmuch as 8 distinct prophage elements were present in the genome of strain MGAS10394, we sought to determine the pattern of distribution of the virulence-related proteins encoded by these elements, among 54 serotype M6 strains from diverse localities (table 1), by use of PCR [15]. These strains were cultured from patients with pharyngitis and invasive episodes. Nineteen distinct virulence gene profiles were identified among the 54 serotype M6 strains (table 1). Of note, virtually all strains had the prophage-element genes encoding SpeA, MF3, and Sda. In contrast, the other prophage-element genes were variably present among the isolates (table 1).
Distribution and structure of prophage element Φ10394.2 in GAS. Prophage element Φ10394.2 encodes the SpeA4 variant of scarlet fever toxin (figure 4). The gene encoding this variant apparently has a relatively restricted phylogenetic distribution, having been reported only in serotype M6, M32, M67, and M77 GAS strains and human group G streptococci (GGS) [18–21, 25]. To test the hypothesis that prophage element Φ10394.2 was conserved in serotype M6 strains from diverse localities, 11 pairs of PCR primers were used to amplify a region of ∼15 kb that, in strain MGAS10394, represents Φ10394.2 and flanking chromosomal regions. The results indicated that prophage element Φ10394.2 was present in 49 of 53 serotype M6 strains studied and was integrated at the same chromosomal site (table 2). These organisms included strain D471, a strain that has been used extensively for GAS molecular genetic studies [26, 27]. Of note, several PCR size variants were identified among the strains studied, indicating that limited heterogeneity was present in this chromosomal region (table 2).
Diagram of the speA4 allele-containing prophage element Φ10394.2. MacVector was used to identify open-reading frames (ORFs) in Φ10394.2. Yellow, chromosome flanking ORFs; blue, inferred proteins encoded by the prophage element; red, inferred proven or putative virulence factors encoded by the prophage element. Note that the streptococcal pyrogenic exotoxin (Spe) I and SpeH variants are truncated forms of the molecules.
The R6 protein, gene, and gene transcripts. A, Schematic showing the R6 protein. There are a variable number of amino acid repeat regions. B, Polymerase chain reaction (PCR) analysis of chromosomal DNA isolated from strain MGAS10394 grown in vitro. PCR size variants have been reported for other group A Streptococcus (GAS) and group B Streptococcus surface-exposed, LPXTG-anchored adhesins that contain multiple carboxyterminus repeat regions [28, 29]. C, Northern blot analysis of R6 gene transcripts. Total RNA was isolated from bacteria harvested at different times. Ten micrograms of total RNA was used in each lane. To ensure that equivalent quantities of RNA were loaded, the RNA was visualized by ethidium bromide staining before transfer to a charged nylon membrane (data not shown). A radiolabeled probe specific for R6 was used. Lanes 1- 3 have RNA prepared from GAS grown to an OD600 of 0.2, 0.4, and 0.8, respectively.
Assessment of cell-surface location of R6 protein, by flow cytometry. Strain MGAS10394 was grown to an OD600 of 0.4 and treated with R6-specific mouse polyclonal antibody or preimmune mouse polyclonal antibody, stained with a phycoerythrin-conjugated donkey anti- mouse IgG secondary antibody, and analyzed by flow cytometry.
Western immunoblot analysis of R6 protein with human serum. Acute and convalescent serum samples were obtained from patients with pharyngitis caused by serotype M6 group A Streptococcus strains with the gene encoding the R6 protein. Lane 1, SDS-PAGE- and Coomassie blue-stained 70-kDa and 50-kDa recombinant R6 protein expressed in Escherichia coli. Lanes 2, 3, and 4, Serum samples from 3 patients with pharyngitis. Serum samples were diluted 1:3000. A, acute serum; C, convalescent serum.
Prophage-encoded virulence factor genes in serotype M6 group A Streptococcus (GAS) strains.
Next, we used PCR tiling to determine whether the 8.3-kb element was present in 2 serotype M32 strains and 11 serotype M77 strains (no serotype M67 strains were available for testing). One of the 2 serotype M32 strains had this element, and, as assessed by PCR, it was integrated at the same chromosomal location as the serotype M6 strains. PCR analysis indicated that the 11 serotype M77 strains lacked genes encoding SpeA4 and the R6 protein (table 2).
Sequence analysis showed that 48 of 49 serotype M6 strains analyzed and the 1 serotype M32 strain had the speA4 allele (table 2). Strains D471 and 10RS101 each had a single nucleotide polymorphism in speA4. One M6 strain had the speA1 allele.
Transcript analysis of the gene encoding the R6 protein. Northern blot analysis was used to test the hypothesis that strain MGAS10394 cultured in vitro expressed the gene encoding the R6 protein. Two transcripts (∼2.9-kb and ∼2.4-kb) were identified in bacteria grown to mid- and late-logarithmic and early stationary phases (OD600 = 0.2, 0.4, and 0.8, respectively) (figure 5).
Analysis of bacterial cell-surface location of the R6 protein. Many surface-exposed virulence proteins made by gram-positive bacteria contain an LPXTG motif at the carboxyterminus that covalently anchors the molecule to the peptidoglycan layer [23]. Inasmuch as the R6 protein contained a typical gram-positive secretion signal sequence and a predicted LPKTG motif at the carboxyterminus and Northern blot analysis indicated that the gene was transcribed, we tested the hypothesis that the molecule is located on the cell surface of strain MGAS10394. Flow-cytometric analysis indicated that there was a substantial (3.6-fold) increase in mean fluorescence between strain MGAS10394 incubated with immune serum and the same strain incubated with preimmune serum, a result consistent with the hypothesis (figure 6).
Seroconversion to the R6 protein in patients with pharyngitis caused by serotype M6 GAS strains. We next tested the hypothesis that the R6 protein is made during human phar yngitis episodes. Serum samples obtained from 7 patients with pharyngitis caused by serotype M6 strains containing the gene encoding R6 protein were studied. As assessed by Western immunoblot analysis, convalescent serum samples from all 7 patients had anti-R6 protein antibodies, but not serum samples obtained during the acute stage of infection (figure 7 and data not shown). These results indicate that the R6 protein is expressed in patients with pharyngitis. We also tested the hypothesis that immunoreactive R6 protein was present in the culture supernatant of strain MGAS10394 grown in vitro. No immunoreactive material was identified by Western immuno-blot analysis using supernatants obtained from bacteria harvested at mid- and late-exponential and early stationary phases of growth (data not shown).
Distribution of the gene encoding the R6 protein in diverse GAS strains. To test the hypothesis that the gene encoding the R6 protein is widely distributed in GAS, PCR analysis was conducted on 112 serotype M6 strains and 88 strains representing 11 other M protein serotypes (M1, M2, M3, M4, M12, M18, M22, M28, M75, M77, and M89), from 12 states in the United States. Virtually all (n=104) serotype M6 strains had the R6 gene, whereas none of the non-M6 strains had this gene, as assessed by PCR. Of note, as assessed by PCR, the R6 gene in most of the serotype M6 strains was not located where prophage element F10394.4 was found in the genome of strain MGAS10394 (data not shown).
A preliminary description of the ORF encoding the R6 protein was recently published [9], but the study generating that description did not include functional analyses. Several attributes of the R6 gene and protein have been revealed by the present study. First, the gene encoding this protein is distributed widely among serotype M6 organisms. Although an exhaustive survey was not performed, we did not identify the R6 gene in 88 strains representing 11 other M serotypes commonly causing human infections. These data are consistent with the hypothesis that the R6 gene was introduced into an M6 precursor strain, giving rise to M6 organisms that are now widely disseminated. The widespread distribution of the R6-positive serotype M6 strains suggests that the R6 protein confers enhanced fitness properties, an idea consistent with our hypothesis that the protein functions as a host-cell adhesin.
Two R6 gene transcripts (∼2.9 kb and ∼2.4 kb) were identified by Northern blot analysis of strain MGAS10394 grown in vitro, and these 2 transcripts were present in bacteria grown to mid- and late-logarithmic and early stationary phases (OD600 = 0.2, 0.4, and 0.8, respectively), indicating that the R6 gene is transcribed throughout growth. Moreover, R6 protein was detected on the bacterial cell surface by use of fluorescenceactivated cell sorter analysis. We believe that the 2 transcripts result in the production of 2 distinct R6 protein variants that differ from each other by the number of amino acid repeat regions present in the carboxyterminal region of the molecule. This phenomenon has been reported for other GAS and group B Streptococcus surface-exposed, LPXTG-anchored adhesins that contain multiple carboxyterminus repeat regions [28]. Attempts to identify multiple forms of the R6 protein after release from the bacterial cell surface, by use of various strategies, were not successful (D.J.B., unpublished data).
The SpeA4 variant of SpeA was identified in an analysis of the molecular population genetics of speA in natural populations of GAS [18]. Subsequent studies and the results presented here have shown that the speA4 allele is largely confined to serotype M6 strains, although a few additionalMserotypes also have been reported to have this gene [19–21]. Our finding that speA4 is encoded by a prophage remnant, rather than by an intact full-length prophage, provides a potential explanation for the relatively restricted distribution of this variant in natural populations. The conservation of gene content and the chromosomal integration site of Φ10394.2, among all serotype M6 strains analyzed, suggests that the speA4 allele was introduced only once into an ancestral M6 strain. As discussed elsewhere [18], speA4 is 11% divergent at the nucleotide sequence level from speA1, speA2, and speA3 and, hence, has not shared a recent common ancestor with 1 of these 3 alleles. The occurrence of the speA4 allele in group G Streptococcus (GGS) [25] and its association with prophage genes suggests the possibility of introduction into GAS from a GGS source. In this regard, phage transduction is known to occur between GGS and GAS [29, 30] and provides a plausible mechanism for gene recombination.
Our analysis found that 19 distinct combinations of prophage- element genes encoding extracellular proven or putative virulence proteins were present in the 54 strains studied. These results are consistent with data from serotype M3 and M18 strains indicating that prophages are responsible for the majority of variation in gene content among strains of the same M protein serotype [4, 5]. Transduction of phage-encoded proven and putative virulence genes in GAS provides a facile mechanism to test new genotypes, a process that may result in rapid generation of strains with enhanced fitness and potentially altered virulence attributes. Our data demonstrate that substantial genotypic variation exists within isolates with the same M protein serotype or emm type. Data derived from several studies indicate that (1) variation in gene content due to differences in prophage content and (2) allelic variation of genes shared among strains are dominant processes creating intra-M-type genotypic variation [31–33].
The genome sequence of a serotype M6 GAS strain, together with prophage profiling data and analysis of mefA-containing organisms, permits us to formulate a hypothesis for certain aspects of the evolution of M6 strains. On the basis of the widespread occurrence of the R6 gene among serotype M6 strains, we believe it is likely that the prophage encoding this gene was introduced into an M6 precursor in the evolutionarily distant past. Given the distribution of the apparently defective prophage encoding speA4, it is likely that the precursor organism had the speA4 prophage. The result was the generation of a speA4-positive, R6-positive serotype M6 cell line. Subsequently, the mefA transposable element was introduced (probably once thus far) into a serotype M6 cell line with the above characteristics, resulting in a macrolide-resistant serotype M6 subclone that is, at present, apparently restricted in the extent of its geographic distribution.
The genome sequences of 6 GAS strains are now publicly available, including those of serotype M1, M3 (2 strains), M5, M6, and M18 [3–6] organisms. As a consequence, substantial information has accrued that bears on the molecular population genomics of this important human pathogen. Each genome has revealed novel features about GAS, including previously undetected prophages and prophage-encoded genes that may contribute to unique clinical features. Given the diverse clinical syndromes caused by GAS, it is apparent that a full understanding of pathogenesis will require extensive knowledge of the metagenome of this microbe. In contrast to certain pathogens that are characterized by relatively restricted allelic variation and diversity in gene content—such as Mycobacterium tuberculosis, Yersinia pestis, and Chlamydia pneumoniae—GAS is genetically very diverse [31–35]. Moreover, relatively little genetic analysis has been conducted on strains causing infections in many developing countries, and evidence exists that novel genotypes are circulating in these countries [36, 37]. Thus, it is likely that GAS is considerably more variable than we understand at present. Nevertheless, on the basis of discoveries made thus far in GAS and many other bacterial pathogens [3–6, 38] and improvements in DNA sequencing, continued effort toward understanding the GAS metagenome is warranted. The resulting data will be crucial to efforts directed at understanding GAShuman interactions, the molecular evolution of this pathogen, and the development of novel therapeutics.
↵a Present affiliation: Department of Microbiology, Immunology, and Molecular Genetics, University of California at Los Angeles, Los Angeles.
IDSA Members: For your free access to this journal, log in via the IDSA members area.
Open access options for authors visit Oxford Open
This journal enables compliance with the NIH Public Access Policy