Skip to content

Commit b0dbbdb

Browse files
committed
Update docs.
1 parent e3b2e56 commit b0dbbdb

File tree

5 files changed

+47
-21
lines changed

5 files changed

+47
-21
lines changed

docs/make.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,9 @@ makedocs(
2020
"Inner product similarity" => joinpath("similarities", "inner_prod.md")],
2121
"Function-space hashing" => "function_hashing.md",
2222
"Performance tips" => "performance.md",
23+
"FAQ" => "faq.md",
24+
"Notation and glossary" => "notation_and_glossary.md",
2325
"API reference" => "full_api.md",
24-
"FAQ" => "faq.md"
2526
]
2627
)
2728

docs/src/function_hashing.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,33 @@ julia> hashfn(x -> 5x^3 - 2x^2 - 9x + 1)
1919
1
2020
```
2121

22+
## Similarity statistics in ``L_{\mu}^p`` function spaces
23+
24+
### ``L_{\mu}^p`` distance
25+
26+
```math
27+
\|f - g\|_{L_{\mu}^p} = \left(\int_{\Omega} |f(x) - g(x)|^p \hspace{0.15cm} d\mu(x)\right)^{1/p}
28+
```
29+
30+
### Inner product similarity
31+
32+
```math
33+
\left\langle f, g\right\rangle_{L_{\mu}^2} = \int_{\Omega} f(x)g(x) \hspace{0.15cm} d\mu(x)
34+
```
35+
36+
When `f(x)` and `g(x)` are allowed to take on complex values, the inner product is defined as
37+
38+
```math
39+
\left\langle f, g\right\rangle_{L_{\mu}^2} = \int_{\Omega} f(x)\overline{g(x)} \hspace{0.15cm} d\mu(x)
40+
```
41+
42+
where ``\overline{g(x)}`` is the complex conjugate of ``g(x)``.
43+
44+
### Cosine similarity
45+
```math
46+
\text{cossim}(f,g) = \frac{\left\langle f,g\right\rangle_{L_{\mu}^2}}{\|f\|_{L_{\mu}^2} \cdot \|g\|_{L_{\mu}^2}}
47+
```
48+
2249
## Function approximation-based hashing
2350

2451
!!! warning "API subject to change"
@@ -99,11 +126,13 @@ julia> hashfn(x -> x/(2π))
99126
Create a hash function with a different number of sample points.
100127

101128
```jldoctest; setup = :(using LSHFunctions)
102-
julia> μ() = rand(); # Samples a random point from [0,1]
129+
julia> μ() = rand(); # μ samples a random point from [0,1]
103130
104131
julia> hashfn = MonteCarloHash(cossim, μ; volume=1.0, n_samples=512);
105132
106133
julia> length(hashfn.sample_points)
107134
512
108135
```
109136

137+
## Footnotes
138+

docs/src/lshfunction_api.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
9494
10
9595
```
9696

97-
- [`similarity`](@ref): returns the similarity function for which the input [`LSHFunction`](@ref) is locality-sensitive:
97+
- [`similarity`](@ref): returns the similarity statistic on which your hash function is locality-sensitive:
9898

9999
```jldoctest; setup = :(using LSHFunctions)
100100
julia> hashfn = LSHFunction(cossim);
@@ -103,7 +103,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
103103
cossim (generic function with 2 methods)
104104
```
105105

106-
- [`hashtype`](@ref): returns the type of hash computed by the input hash function. Note that in practice `hashfn(x)` (or [`index_hash(hashfn,x)`](@ref) and [`query_hash(hashfn,x)`](@ref) for an [`AsymmetricLSHFunction`](@ref)) will return an array of hashes, one for each hash function you generated. [`hashtype`](@ref) is the data type of each element of `hashfn(x)`.
106+
- [`hashtype`](@ref): returns the type of hash computed by the input hash function. Note that in practice `hashfn(x)` (or [`index_hash(hashfn,x)`](@ref) and [`query_hash(hashfn,x)`](@ref) for an [`AsymmetricLSHFunction`](@ref)) will return an array of hashes, one for each hash function you sampled when you called [`LSHFunction`](@ref). [`hashtype`](@ref) is the data type of each element of `hashfn(x)`.
107107

108108
```jldoctest; setup = :(using LSHFunctions)
109109
julia> hashfn = LSHFunction(cossim, 5);
@@ -135,7 +135,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
135135
true
136136
```
137137

138-
We often want to compute the probability that not just one hash collides, but that multiple hashes collide simultaneously. You can calculate this using the `n_hashes` keyword argument. If left unspecified, then [`collision_probability`](@ref) will use [`n_hashes(hashfn)`](@ref n_hashes) hash functions to compute the probability.
138+
We often want to compute the probability that not just one hash collides, but that multiple hashes collide simultaneously. You can calculate this using the `n_hashes` keyword argument. If left unspecified, this parameter will default to [`n_hashes(hashfn)`](@ref n_hashes).
139139

140140
```jldoctest; setup = :(using LSHFunctions)
141141
julia> hashfn = MinHash(5);

docs/src/notation_and_glossary.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# LSHFunctions notation and glossary
2+
3+
!!! warning "Under construction"
4+
This section is currently being developed. If you're interested in helping write this section, feel free to [open a pull request](https://github.com/kernelmethod/LSHFunctions.jl/pulls); otherwise, please check back later.
5+
6+

docs/src/similarities/lp_distance.md

Lines changed: 6 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,14 @@
44
This section is currently being developed. If you're interested in helping write this section, feel free to [open a pull request](https://github.com/kernelmethod/LSHFunctions.jl/pulls); otherwise, please check back later.
55

66
## Definition
7-
``\ell^p`` distance is a generalization of our usual notion of distance between a pair of points. If you're not familiar with it, you can think of it as a generalization of the Pythagorean theorem: if we have two points ``(a_1,b_1)`` and ``(a_2,b_2)``, then the distance between them is
8-
9-
```math
10-
\text{distance} = \sqrt{(a_1 - b_1)^2 + (a_2 - b_2)^2}
11-
```
12-
13-
This is known as the *``\ell^2`` distance* (or Euclidean distance) between ``(a_1,b_1)`` and ``(a_2,b_2)``. In higher dimensions, the ``\ell^2`` distance between the points ``x = (x_1,\ldots,x_n)`` and ``y = (y_1,\ldots,y_n)`` is denoted as ``\|x - y\|_{\ell^2}`` (since ``\ell^2`` distance, and, for that matter, all ``\ell^p`` distances of order ``\ge 1``, are [norms](https://en.wikipedia.org/wiki/Norm_(mathematics))) and defined as[^1]
14-
15-
```math
16-
\|x - y\|_{\ell^2} = \sqrt{\sum_{i=1}^n \left|x_i - y_i\right|^2}
17-
```
18-
19-
More generally, the ``\ell^p`` distance between the two length-``n`` vectors ``x`` and ``y`` is given by
7+
The ``\ell^p`` distance between two length-``n`` vectors ``x`` and ``y`` is defined as
208

219
```math
2210
\|x - y\|_{\ell^p} = \left(\sum_{i=1}^n \left|x_i - y_i\right|^p\right)^{1/p}
2311
```
2412

13+
``\ell^p`` distance is a valid [norm](https://en.wikipedia.org/wiki/Norm_(mathematics)) for all ``p \ge 1``, and is bounded between ``0`` and ``+\infty``. In the context of locality-sensitive hashing, we say that two points are similar if ``\|x - y\|_{\ell^p}`` is small, and dissimilar if ``\|x - y\|_{\ell^p}`` is large[^1].
14+
2515
In the LSHFunctions module, you can calculate the ``\ell^p`` distance between two points using the function [`ℓp`](@ref). The functions [`ℓ1`](@ref ℓp) and [`ℓ2`](@ref ℓp) are also defined for ``\ell^1`` and ``\ell^2`` distance, respectively, since they're so commonly used:
2616

2717
```jldoctest
@@ -135,8 +125,8 @@ for scale in (0.25, 1.0, 4.0)
135125
y1 = [collision_probability(l1_hashfn, xii) for xii in x]
136126
y2 = [collision_probability(l2_hashfn, xii) for xii in x]
137127
138-
axes[1].plot(x, y1, label="\$r = $scale\$")
139-
axes[2].plot(x, y2, label="\$r = $scale\$")
128+
axes[1].plot(x, y1, label="\$scale = $scale\$")
129+
axes[2].plot(x, y2, label="\$scale = $scale\$")
140130
end
141131
142132
axes[1].set_xlabel(raw"$\|x - y\|_{\ell^1}$", fontsize=20)
@@ -159,7 +149,7 @@ For further information about the collision probability, see Section 3.2 of the
159149

160150
### Footnotes
161151

162-
[^1]: In general, ``x`` and ``y`` are allowed to be complex vectors. We sum over ``\left|x_i - y_i\right|`` (the magnitude of ``x_i - y_i``) instead of ``(x_i - y_i)^2`` to guarantee that ``\|x - y\|_{\ell^2}`` is a real number even when ``x`` and ``y`` are complex.
152+
[^1]: "small" and "large" are relative terms, of course. `LpHash` has a parameter `scale` that influences the relationship between ``\ell^p`` distance and collision probability, which helps us differentiate between what distances are small and which are large.
163153

164154
[^Datar04]: Datar, Mayur and Indyk, Piotr and Immorlica, Nicole and Mirrokni, Vahab. (2004). *Locality-sensitive hashing scheme based on p-stable distributions*. Proceedings of the Annual Symposium on Computational Geometry. 10.1145/997817.997857.
165155

0 commit comments

Comments
 (0)