Skip to main content
Fig. 3 | Algorithms for Molecular Biology

Fig. 3

From: Estimating similarity and distance using FracMinHash

Fig. 3

Computational resources consumed by Mash and frac-kmc when estimating metrics for a large number of samples. A, B, and C show the CPU time, wall-clock time, and peak memory usage for 1000-3682 samples in the Ecoli dataset. D, E, and F show the same plots for the HMP dataset, varying the number of samples from 100 to 300. Both tools were run using 128 threads. The bars in A, B, D, and E are split into two parts: the bottom part shows the time to compute the sketches only, and the top part shows the time to compute the metric from the sketches. Using frac-kmc is up to 18% faster and the memory usage is similar in the Ecoli dataset, where the samples are smaller genomes. On the other hand, frac-kmc runs about 2-2.1x slower in the HMP dataset (the samples are large metagenomes), and roughly uses 100x more memory than Mash. These heavier resource usages of frac-kmc allow for the highly accurate results shown in Figure 2

Back to article page