Ohhh nice catch. I didn’t realize changing the distance would matter as long as the order was preserved. I see now that we are putting the distances in the vector index and using it as a threshold.

I think the angular distance could be computed from dot products, is that right? Since

dot(u, v) = |u| * |v| * cos(theta)

then what we want is I think theta = arccos( dot(u, v) / (|u| * |v|) )

is that right?

And just to sanity check some values:

  • angular distances of [0, 1] and [1, 0] is pi/2
  • angular distances of [1, 0] and [0, 1] is pi
  • angular distances of [1, 0] and [1, 1] is pi/4

@erikbern ,The angular distance in the current datasets of annbenchmarks is actually the cosine distance, which is calculated as 1 minus the cosine similarity of the vectors.

d= 1-cos(u,v)=1-dot(u, v) / (|u| * |v|),we have committed a new PR #372 to fix it.

Besides, would you please consider relaxing the timeout setting to 24 hours? We found that for some datasets, some algorithms (such as NGTqg and qsgNGT) cannot complete the construction within 2 hours, but when the timeout is set to 24 hours, they get good qps-recall performance. Of course, these algorithms’ disadvantage in construction time will be reflected in the Recall-build time performance. 24 hours of construction time is indeed a bit long, but for some offline construction applications, it is acceptable to trade construction time for qps-recall performance.

Read More