Tuesday, August 11, 2009

the locality sensitive hashing method for nearest neighbor search

the locality sensitive hashing method for NNS:
1. The approximation method, not an exact one.
2. Could be used when approximate answer is acceptable.
3. A very simplified description of the LSH alg.
A hash function is called locality sensitive if for any two points p, q, Pr(h[p] = h[q]) is strictly decreasing as the distance between p and q increases.
An example, please look at the CACM paper about LSH.

1. Generate a set of vectors. The dimension of the vectors should be the same as the one of the data points. Each value in these vectors follows a certain distribution(normal distribution) and these values are independent to each other.
2. Compute a fingerprint for each data point using the above vectors. We need to make sure that the closest the two points, the more likely the corresponding fingerprints would be the same.

For the sake of the NNSs,
collect all the data that has the same hash values, and calculate the real distance to check whether they are NNs or not.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.