Skip to main content

Table 5 Suggested scale factors for various levels of desired confidence and various tolerable rates of error, when \(\min (m,n) = 10000000\). For 10M elements, we can use a small fraction of the elements to get the desired accuracy when estimating the cosine similarity

From: Estimating similarity and distance using FracMinHash

 

Desired level of confidence, \(\alpha\)

Tolerable Error, \(\delta\)

0.91

0.93

0.95

0.97

0.99

0.01

0.0509

0.0539

0.0580

0.0642

0.0775

0.03

0.0058

0.0061

0.0066

0.0073

0.0088

0.05

0.0021

0.0022

0.0024

0.0027

0.0032

0.07

0.0011

0.0012

0.0013

0.0014

0.0017

0.09

0.0007

0.0007

0.0008

0.0009

0.0010

0.1

0.0006

0.0006

0.0006

0.0007

0.0008