Longest maximum exact match (LMEM) selection. (a) In a compacted graph, the variation in the overall sequence set determines the sequence fragmentation and node size. a.1 shows a toy graph of three sequences Q1, R1, and R2 that are composed of four nodes i–iv (see Supplementary Figure S2 for further details). The k + 1 overlap between nodes as seen in ccDBGs is not shown for simplicity. The sequence strands are indicated with arrows, the forward and reverse sequences are shown in the nodes. While Q1 and R2 share a MEM of 8nt (a.2) consisting of nodes i-ii, it does not show up as a single node since another sequence, R1 contains a subsequence of this MEM (node ii, a.1). Similarly, Q1 and R1 share a MEM of 9nt (a.3) consisting of nodes ii–iii. a.4 shows how two LMEMs are resolved for this example. Each node is assigned to the longest MEM it is part of. Node ii is part of two MEMs and gets merged with node iii into LMEM 2 (9nt) which is 1 nucleotide longer than MEM 1 (8nt). As the remaining node i is only covered by MEM 1, it is assigned to LMEM 1, and the query sequence is painted accordingly (a.4). (b) To validate Graphite on real data we aligned C. jejuni CP071576 against CP071584 and CP085965 using E-MEM and Graphite. The majority of Graphite LMEMs originated from CP071584 (gray). (c) Close-up example of LMEM selection. First, MEM I is selected from CP071584 as it is longer than the multiple overlapping CP085965 MEMs. A part of MEM V is selected as the next LMEM, as V is longer than II. Likewise, a part of MEM III is selected over MEM VI before Graphite's LMEMs continue into VII.
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.