Skip to content

Commit 9893959

Browse files
committed
Add a page for function-space hashing.
1 parent 07f8681 commit 9893959

File tree

4 files changed

+119
-2
lines changed

4 files changed

+119
-2
lines changed

docs/make.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,10 @@ makedocs(
1515
"The LSHFunction API" => "lshfunction_api.md",
1616
"Similarity functions" => [
1717
"Cosine similarity" => joinpath("similarities", "cosine.md"),
18-
raw"``\ell^p`` distance" => joinpath("similarities", "lp_distance.md"),
18+
"``\\ell^p`` distance" => joinpath("similarities", "lp_distance.md"),
1919
"Jaccard similarity" => joinpath("similarities", "jaccard.md"),
2020
"Inner product similarity" => joinpath("similarities", "inner_prod.md")],
21+
"Function-space hashing" => "function_hashing.md",
2122
"Performance tips" => "performance.md",
2223
"API reference" => "full_api.md",
2324
"FAQ" => "faq.md"

docs/src/full_api.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,13 @@ Private = false
3434
Pages = ["similarities.jl"]
3535
```
3636

37+
## Hashing in function spaces
38+
39+
```@docs
40+
MonteCarloHash
41+
ChebHash
42+
```
43+
3744
## Miscellaneous
3845

3946
```@docs

docs/src/function_hashing.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Hashing in ``L^p`` function spaces
2+
3+
!!! warning "Under construction"
4+
This section is currently being developed. If you're interested in helping write this section, feel free to [open a pull request](https://github.com/kernelmethod/LSHFunctions.jl/pulls); otherwise, please check back later.
5+
6+
LSHFunctions supports locality-sensitive hashing over ``L^p`` function spaces. In other words, you can hash functions like `sin`, `exp`, and `f(x) = 5x^3 - 2x^2 - 9x + 1` on a few different similarities. Here's an example using [`MonteCarloHash`](@ref) over cosine similarity:
7+
8+
```jldoctest; setup = :(using Random; Random.seed!(0))
9+
julia> using LSHFunctions;
10+
11+
julia> μ() = 2π*rand(); # μ samples a random point from [0,2π]
12+
13+
julia> hashfn = MonteCarloHash(cossim, μ, 3);
14+
15+
julia> hashfn(x -> 5x^3 - 2x^2 - 9x + 1)
16+
3-element BitArray{1}:
17+
0
18+
1
19+
1
20+
```
21+
22+
## Function approximation-based hashing
23+
24+
!!! warning "API subject to change"
25+
The API for both [`ChebHash`](@ref) and [`MonteCarloHash`](@ref), but especially the former, is being modified very quickly. As a result, the docs below may change radically for future versions of the LSHFunctions package.
26+
27+
Create a hash function for cosine similarity for functions in ``L^2([-1,1])``:
28+
29+
```jldoctest; setup = :(using LSHFunctions)
30+
julia> hashfn = ChebHash(cossim, 50; interval=@interval(-1 ≤ x ≤ 1));
31+
32+
julia> n_hashes(hashfn)
33+
50
34+
35+
julia> similarity(hashfn) == cossim
36+
true
37+
38+
julia> hashtype(hashfn)
39+
Bool
40+
```
41+
42+
Create a hash function for ``L^2`` distance defined over ``L^2([0,2\pi])``. Hash the functions `f(x) = cos(x)` and `f(x) = x/(2π)` using the returned [`ChebHash`](@ref):
43+
44+
```jldoctest; setup = :(using LSHFunctions, Random; Random.seed!(0))
45+
julia> hashfn = ChebHash(L2, 3; interval=@interval(0 ≤ x ≤ 2π));
46+
47+
julia> hashfn(cos)
48+
3-element Array{Int32,1}:
49+
3
50+
-1
51+
-2
52+
53+
julia> hashfn(x -> x/(2π))
54+
3-element Array{Int32,1}:
55+
0
56+
1
57+
0
58+
```
59+
60+
## Monte Carlo-based hashing
61+
62+
Create a hash function for cosine similarity for functions in ``L^2([-1,1])``:
63+
64+
```jldoctest; setup = :(using LSHFunctions)
65+
julia> μ() = 2*rand()-1; # μ samples a random point from [-1,1]
66+
67+
julia> hashfn = MonteCarloHash(cossim, μ, 50; volume=2.0);
68+
69+
julia> n_hashes(hashfn)
70+
50
71+
72+
julia> similarity(hashfn) == cossim
73+
true
74+
75+
julia> hashtype(hashfn)
76+
Bool
77+
```
78+
79+
Create a hash function for ``L^2`` distance in the function space ``L^2([0,2\pi])``. Hash the functions `f(x) = cos(x)` and `f(x) = x/(2π)` using the returned [`MonteCarloHash`](@ref).
80+
81+
```jldoctest; setup = :(using LSHFunctions, Random; Random.seed!(0))
82+
julia> μ() = 2π * rand(); # μ samples a random point from [0,2π]
83+
84+
julia> hashfn = MonteCarloHash(L2, μ, 3; volume=2π);
85+
86+
julia> hashfn(cos)
87+
3-element Array{Int32,1}:
88+
-1
89+
3
90+
0
91+
92+
julia> hashfn(x -> x/(2π))
93+
3-element Array{Int32,1}:
94+
-1
95+
-2
96+
-1
97+
```
98+
99+
Create a hash function with a different number of sample points.
100+
101+
```jldoctest; setup = :(using LSHFunctions)
102+
julia> μ() = rand(); # Samples a random point from [0,1]
103+
104+
julia> hashfn = MonteCarloHash(cossim, μ; volume=1.0, n_samples=512);
105+
106+
julia> length(hashfn.sample_points)
107+
512
108+
```
109+

docs/src/similarities/lp_distance.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,5 +161,5 @@ For further information about the collision probability, see Section 3.2 of the
161161

162162
[^1]: In general, ``x`` and ``y`` are allowed to be complex vectors. We sum over ``\left|x_i - y_i\right|`` (the magnitude of ``x_i - y_i``) instead of ``(x_i - y_i)^2`` to guarantee that ``\|x - y\|_{\ell^2}`` is a real number even when ``x`` and ``y`` are complex.
163163

164-
[^Datar04]: Datar, Mayur & Indyk, Piotr & Immorlica, Nicole & Mirrokni, Vahab. (2004). *Locality-sensitive hashing scheme based on p-stable distributions*. Proceedings of the Annual Symposium on Computational Geometry. 10.1145/997817.997857.
164+
[^Datar04]: Datar, Mayur and Indyk, Piotr and Immorlica, Nicole and Mirrokni, Vahab. (2004). *Locality-sensitive hashing scheme based on p-stable distributions*. Proceedings of the Annual Symposium on Computational Geometry. 10.1145/997817.997857.
165165

0 commit comments

Comments
 (0)