The use of S and AAF metrics improves SARS-CoV-2 de novo iSNVs’ PHATE structure by mitigating sequencing center batch effects and artifacts. (A) PHATE visualization of the unfiltered de novo matrix containing 8 000 668 iSNVs, labelled by the libraries’ sequencing centers. (B) PHATE visualization as in (A), but with labels showing the percentage of k = 100 nearest neighbours that share the same sequencing center annotation as the library itself and the total PNNSC value is displayed at the top left. (C and D) Boxplots displaying PNNSC values for each PHATE visualization, derived from a sub-sampling controlled experiment across ten replicates (see Method section 2.5). (C) Shows PNNSC values across various S metric thresholds, and (D) presents PNNSC values across different AAF metric thresholds. (E) PHATE visualization of the de novo matrix, filtered based on S and AAF thresholds, labelled by sequencing centers. The arrow points to a set of libraries that seem to diverge from the main cluster. (F) PHATE visualization as in (E), but with labels showing PNNSC values, and the total PNNSC value is displayed at the top left. In this representation sequencing centers with at least 1000 libraries in our dataset are explicitly labelled with its sequencing center as follows: Welcome Sanger Institute (WSI), National Institute of Health DR. Ricardo Jorge (NIHRJ), Doherty Institute (DI), CDC-OAMD (CDC), Comenius University in Bratislava (CUB), Ravi Kant (RK), University of Tartu in Estonia (UTE), Chan Zuckerberg Biohub (CZB), Kwazulu-Natal Sequencing Platform (KSP), INAB Insitute in Certh (IIC), BROAD GCID (BROAD), Wales Specialist Virology Center (WSVR). While the remaining libraries were grouped under the ‘Other’ label.
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.