You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/lshfunction_api.md
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -94,7 +94,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
94
94
10
95
95
```
96
96
97
-
-[`similarity`](@ref): returns the similarity function for which the input [`LSHFunction`](@ref) is locality-sensitive:
97
+
-[`similarity`](@ref): returns the similarity statistic on which your hash function is locality-sensitive:
98
98
99
99
```jldoctest; setup = :(using LSHFunctions)
100
100
julia> hashfn = LSHFunction(cossim);
@@ -103,7 +103,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
103
103
cossim (generic function with 2 methods)
104
104
```
105
105
106
-
-[`hashtype`](@ref): returns the type of hash computed by the input hash function. Note that in practice `hashfn(x)` (or [`index_hash(hashfn,x)`](@ref) and [`query_hash(hashfn,x)`](@ref) for an [`AsymmetricLSHFunction`](@ref)) will return an array of hashes, one for each hash function you generated. [`hashtype`](@ref) is the data type of each element of `hashfn(x)`.
106
+
-[`hashtype`](@ref): returns the type of hash computed by the input hash function. Note that in practice `hashfn(x)` (or [`index_hash(hashfn,x)`](@ref) and [`query_hash(hashfn,x)`](@ref) for an [`AsymmetricLSHFunction`](@ref)) will return an array of hashes, one for each hash function you sampled when you called [`LSHFunction`](@ref). [`hashtype`](@ref) is the data type of each element of `hashfn(x)`.
107
107
108
108
```jldoctest; setup = :(using LSHFunctions)
109
109
julia> hashfn = LSHFunction(cossim, 5);
@@ -135,7 +135,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
135
135
true
136
136
```
137
137
138
-
We often want to compute the probability that not just one hash collides, but that multiple hashes collide simultaneously. You can calculate this using the `n_hashes` keyword argument. If left unspecified, then [`collision_probability`](@ref) will use [`n_hashes(hashfn)`](@ref n_hashes) hash functions to compute the probability.
138
+
We often want to compute the probability that not just one hash collides, but that multiple hashes collide simultaneously. You can calculate this using the `n_hashes` keyword argument. If left unspecified, this parameter will default to [`n_hashes(hashfn)`](@ref n_hashes).
This section is currently being developed. If you're interested in helping write this section, feel free to [open a pull request](https://github.com/kernelmethod/LSHFunctions.jl/pulls); otherwise, please check back later.
Copy file name to clipboardExpand all lines: docs/src/similarities/lp_distance.md
+6-16Lines changed: 6 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -4,24 +4,14 @@
4
4
This section is currently being developed. If you're interested in helping write this section, feel free to [open a pull request](https://github.com/kernelmethod/LSHFunctions.jl/pulls); otherwise, please check back later.
5
5
6
6
## Definition
7
-
``\ell^p`` distance is a generalization of our usual notion of distance between a pair of points. If you're not familiar with it, you can think of it as a generalization of the Pythagorean theorem: if we have two points ``(a_1,b_1)`` and ``(a_2,b_2)``, then the distance between them is
This is known as the *``\ell^2`` distance* (or Euclidean distance) between ``(a_1,b_1)`` and ``(a_2,b_2)``. In higher dimensions, the ``\ell^2`` distance between the points ``x = (x_1,\ldots,x_n)`` and ``y = (y_1,\ldots,y_n)`` is denoted as ``\|x - y\|_{\ell^2}`` (since ``\ell^2`` distance, and, for that matter, all ``\ell^p`` distances of order ``\ge 1``, are [norms](https://en.wikipedia.org/wiki/Norm_(mathematics))) and defined as[^1]
``\ell^p`` distance is a valid [norm](https://en.wikipedia.org/wiki/Norm_(mathematics)) for all ``p \ge 1``, and is bounded between ``0`` and ``+\infty``. In the context of locality-sensitive hashing, we say that two points are similar if ``\|x - y\|_{\ell^p}`` is small, and dissimilar if ``\|x - y\|_{\ell^p}`` is large[^1].
14
+
25
15
In the LSHFunctions module, you can calculate the ``\ell^p`` distance between two points using the function [`ℓp`](@ref). The functions [`ℓ1`](@ref ℓp) and [`ℓ2`](@ref ℓp) are also defined for ``\ell^1`` and ``\ell^2`` distance, respectively, since they're so commonly used:
26
16
27
17
```jldoctest
@@ -135,8 +125,8 @@ for scale in (0.25, 1.0, 4.0)
135
125
y1 = [collision_probability(l1_hashfn, xii) for xii in x]
136
126
y2 = [collision_probability(l2_hashfn, xii) for xii in x]
@@ -159,7 +149,7 @@ For further information about the collision probability, see Section 3.2 of the
159
149
160
150
### Footnotes
161
151
162
-
[^1]: In general, ``x``and ``y`` are allowed to be complex vectors. We sum over ``\left|x_i - y_i\right|`` (the magnitude of ``x_i - y_i``) instead of ``(x_i - y_i)^2``to guarantee that ``\|x - y\|_{\ell^2}`` is a real number even when ``x`` and ``y`` are complex.
152
+
[^1]: "small" and "large" are relative terms, of course. `LpHash` has a parameter `scale` that influences the relationship between ``\ell^p``distance and collision probability, which helps us differentiate between what distances are small and which are large.
163
153
164
154
[^Datar04]: Datar, Mayur and Indyk, Piotr and Immorlica, Nicole and Mirrokni, Vahab. (2004). *Locality-sensitive hashing scheme based on p-stable distributions*. Proceedings of the Annual Symposium on Computational Geometry. 10.1145/997817.997857.
0 commit comments