LSH (Locality Sensitve Hashing) for Cosine similarity.
How to use it for K-NN to get the k nearest neigbors for a d-dimensional vector space.
Faster than space partitioning by kd-tree data structure.
If 'w' is a unit-vector perpendicular to hyperplane 'pi_1'. Then for a point 'x' in the d-dimensional space, the sign of (w_transpose * x) would help us understand in which side of hyperplane 'pi_1' does the point lie. This is basically the dot-product between vectors 'w' and 'x'.
Generate 'm' random hyperplanes and for each point, find in which side of the hyperplace the point lies. Based on that, create a m-tuple for each point. Use this as the key for hashing.
It is a randomized algorithm and not a deterministic one. And there are chances that a neighboring point that lies on other side of hte hyperplane might be missed. The chances of that happening can be reduced by having 'L' different hash-tables with its own set of randomly generated 'm' hyperplanes. And combine the result as the Union of set of results from each hash-table.
How the K-NN classifier behaves for various types of data.
u-shaped data, concentric circles (1 and 2), overlapped data, xor data, two spirals data, linear separable data, outlier data, random data.
plot_decision_regions api from mlextend.
conda install -c conda-forge mlxtend
Sample code (Jupyter notebook).
Machine Learning - Part 5