Fig. 3
From: AlfaPang: alignment free algorithm for pangenome graph construction

AlfaPang algorithm: construction concept and actual data structures. A Input sequences. B Generic variation graph representation. C Bipartite graph for \(k=3\). Colors of edges represent values assigned to them (blue: 1, red: 2, green: 3). Nodes filled with a color other than white belong to the same equivalence class. Orange nodes are both connected to the 3-mer GAT by blue edges. The first and the second pink nodes are connected to GAT by red edges, and the first and third pink nodes are connected to ATG by blue edges. The first and the second yellow nodes are connected to GAT by green edges, and the first and the third are connected to ATG by red edges. Grey nodes are connected to ATG by green edges. D Variation graph resulting from the quotient algorithm. Different edge colors mark different genomic paths (not multi-edges) and are consistent with edge colors in B. Node colors are consistent with equivalence classes shown in C. E Data structures used in the algorithm: s – concatenation of the input sequences, K – vector storing ids of k-mers starting at given positions in s, R – inverted index, enabling locating of k-mer occurrences in s, F – output vector, assigning positions in s to the vertices of the output graph. To find a pink equivalence class, start from symbol “A” at position 3 (note that we use 1-based indexing) – we assign F[3] a new value (2 in this example). Since \(K[3] = 2\), we look into the vector R[2]. The first entry in this vector is 3, and we used the first color. Therefore we need to visit all other positions pointed to by R[2], so we assign \(F[15] = 2\) and push it into the queue. Next, we backtrack one position to K[2]. Since \(K[2] = 1\), we look into R[1]. 2 is canonical vertex position for this (k-mer, color) pair, so we look at the other elements of R[1]. R[1] points to position 9, so we move one position forward and assign \(F[10] = 2\). After backtracking two steps from the starting position, we find \(K[1] = 0\), indicating that this “A” is not the third symbol in any k-mer. We repeat the procedure from found positions 10 and 15, identifying no additional positions