Skip to main content

Table 6 Fraction of times the estimated cosine falls within \(\pm 5\%\) of the true cosine of A and B, for different sizes of A and B. The similarities were estimated using a scale factor of 1/1000, which is the default in sourmash. In a large fraction of times, the estimated cosine is not within \(\pm 5\%\) of the true cosine

From: Estimating similarity and distance using FracMinHash

Num. elements in B

Num. elements in A

100K

200K

300K

400K

500K

100K

0.09

0.20

0.25

0.32

0.38

200K

0.17

0.29

0.55

0.47

0.51

300K

0.28

0.40

0.54

0.58

0.57

400K

0.32

0.45

0.61

0.71

0.74

500K

0.45

0.42

0.73

0.66

0.83