Skip to content

Commit 07f8681

Browse files
committed
Add docstrings for ChebHash and MonteCarloHash.
1 parent 697de8a commit 07f8681

File tree

2 files changed

+174
-11
lines changed

2 files changed

+174
-11
lines changed

src/function_hashing/chebhash.jl

Lines changed: 71 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ ChebHash for hashing the L^2([-1,1]) function space.
66

77
using FFTW
88

9+
#========================
10+
Global constants
11+
========================#
12+
13+
const _DEFAULT_CHEBHASH_INTERVAL = @interval(-1 x 1)
14+
915
#========================
1016
Typedefs
1117
========================#
@@ -30,23 +36,84 @@ struct ChebHash{B, F<:SimilarityFunction, H<:LSHFunction, I<:RealInterval}
3036
end
3137

3238
### External ChebHash constructors
33-
ChebHash(similarity, args...; kws...) =
34-
ChebHash(SimilarityFunction(similarity), args...; kws...)
35-
3639
const _valid_ChebHash_similarities = (
3740
# Function space similarities
3841
(L2, cossim),
3942
# Discrete-space similarities corresponding to function space similarities
4043
(ℓ2, cossim),
4144
)
4245

46+
@doc """
47+
ChebHash(sim, args...; interval=$(_DEFAULT_CHEBHASH_INTERVAL), kws...)
48+
49+
Samples a hash function from an LSH family for the similarity `sim` defined over the function space ``L^p_{\\mu}(\\Omega)``. `sim` may be one of the following:
50+
$(
51+
join(
52+
["- `" * sim * "`" for sim in (_valid_ChebHash_similarities[1] .|>
53+
string |>
54+
collect |>
55+
sort!)
56+
],
57+
"\n"
58+
)
59+
)
60+
61+
`ChebHash` works by approximating a function by Chebyshev polynomials. You can choose the degree of the approximation to trade between speed and generating desirable hash collision probabilities.
62+
63+
!!! info "ChebHash limitations"
64+
`ChebHash` can only hash function spaces of the form ``L^2([a,b])``, where ``[a,b]`` is an interval on the real line. For a more versatile option, checkout out [`MonteCarloHash`](@ref).
65+
66+
# Arguments
67+
- `sim`: the similarity function you want to hash on.
68+
- `args...`: arguments to pass on when building the `LSHFunction` instance underlying the returned `ChebHash` struct.
69+
- `kws...`: keyword arguments to pass on when building the `LSHFunction` instance underlying the returned `ChebHash` struct.
70+
71+
# Examples
72+
Create a hash function for cosine similarity for functions in ``L^2([-1,1])``:
73+
74+
```jldoctest; setup = :(using LSHFunctions)
75+
julia> hashfn = ChebHash(cossim, 50; interval=@interval(-1 ≤ x ≤ 1));
76+
77+
julia> n_hashes(hashfn)
78+
50
79+
80+
julia> similarity(hashfn) == cossim
81+
true
82+
83+
julia> hashtype(hashfn)
84+
$(cossim |> LSHFunction |> hashtype)
85+
```
86+
87+
Create a hash function for ``L^2`` distance defined over ``L^2([0,2\\pi])``. Hash the functions `f(x) = cos(x)` and `f(x) = x/(2π)` using the returned `ChebHash`:
88+
89+
```jldoctest; setup = :(using LSHFunctions, Random; Random.seed!(0))
90+
julia> hashfn = ChebHash(L2, 3; interval=@interval(0 ≤ x ≤ 2π));
91+
92+
julia> hashfn(cos)
93+
3-element Array{Int32,1}:
94+
3
95+
-1
96+
-2
97+
98+
julia> hashfn(x -> x/(2π))
99+
3-element Array{Int32,1}:
100+
0
101+
1
102+
0
103+
```
104+
105+
See also: [`MonteCarloHash`](@ref)
106+
"""
107+
ChebHash(similarity, args...; kws...) =
108+
ChebHash(SimilarityFunction(similarity), args...; kws...)
109+
43110
for (fn_sim, discrete_sim) in zip(_valid_ChebHash_similarities...)
44111
quote
45112
# Add an implementation of ChebHash that dispatches on the similarity
46113
# function fn_sim
47114
function ChebHash(sim::SimilarityFunction{$fn_sim},
48115
args...;
49-
interval::RealInterval = @interval(-1 x 1),
116+
interval::RealInterval = _DEFAULT_CHEBHASH_INTERVAL,
50117
kws...) where S
51118

52119
discrete_hashfn = LSHFunction($discrete_sim, args...; kws...)

src/function_hashing/monte_carlo.jl

Lines changed: 103 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ MonteCarloHash for hashing function spaces.
44
55
================================================================#
66

7+
#========================
8+
Global constants
9+
========================#
10+
11+
const _MONTECARLOHASH_DEFAULT_N_SAMPLES = 1024
12+
713
#========================
814
Typedefs
915
========================#
@@ -46,11 +52,6 @@ struct MonteCarloHash{F, H <: LSHFunction, D, T, S} <: LSHFunction
4652
end
4753

4854
### External MonteCarloHash constructors
49-
50-
# TODO: restrict similarities. E.g. Jaccard should not be an available similarity
51-
MonteCarloHash(similarity, args...; kws...) =
52-
MonteCarloHash(SimilarityFunction(similarity), args...; kws...)
53-
5455
const _valid_MonteCarloHash_similarities = (
5556
# Function space similarities
5657
(L1, L2, cossim),
@@ -60,12 +61,107 @@ const _valid_MonteCarloHash_similarities = (
6061
(1, 2, 2),
6162
)
6263

64+
# TODO: restrict similarities. E.g. Jaccard should not be an available similarity
65+
@doc """
66+
MonteCarloHash(sim, ω, args...; volume=1.0, n_samples=$(_MONTECARLOHASH_DEFAULT_N_SAMPLES), kws...)
67+
68+
Samples a hash function from an LSH family for the similarity `sim` defined over the function space ``L^p_{\\mu}(\\Omega)``. `sim` may be one of the following:
69+
$(
70+
join(
71+
["- `" * sim * "`" for sim in (_valid_MonteCarloHash_similarities[1] .|>
72+
string |>
73+
collect |>
74+
sort!)
75+
],
76+
"\n"
77+
)
78+
)
79+
80+
Given an input function ``f\\in L^p_{\\mu}(\\Omega)``, `MonteCarloHash` works by sampling ``f`` at some randomly-selected points in ``\\Omega``, and then hashing those samples.
81+
82+
# Arguments
83+
- `sim`: the similarity function you want to hash on.
84+
- `ω`: a function that takes no inputs and samples a single point from ``\\Omega``. Alternatively, it can be viewed as a random variable with probability measure
85+
86+
```math
87+
\\frac{\\mu}{\\text{vol}_{\\mu}(\\Omega)} = \\mu\\left(\\int_{\\Omega} d\\mu\\right)^{-1}
88+
```
89+
90+
- `args...`: arguments to pass on when building the `LSHFunction` instance underlying the returned `MonteCarloHash` struct.
91+
- `volume::Real` (default: `1.0`): the volume of the space ``\\Omega``, defined as
92+
93+
```math
94+
\\text{vol}_{\\mu}(\\Omega) = \\int_{\\Omega} d\\mu
95+
```
96+
97+
- `n_samples::Integer` (default: `$(_MONTECARLOHASH_DEFAULT_N_SAMPLES)`): the number of points to sample from each function that is hashed by the `MonteCarloHash`. Larger values of `n_samples` tend to capture the input function better and will thus be more likely to achieve desirable collision probabilities.
98+
- `kws...`: keyword arguments to pass on when building the `LSHFunction` instance underlying the returned `MonteCarloHash` struct.
99+
100+
# Examples
101+
Create a hash function for cosine similarity for functions in ``L^2([-1,1])``:
102+
103+
```jldoctest; setup = :(using LSHFunctions)
104+
julia> μ() = 2*rand()-1; # μ samples a random point from [-1,1]
105+
106+
julia> hashfn = MonteCarloHash(cossim, μ, 50; volume=2.0);
107+
108+
julia> n_hashes(hashfn)
109+
50
110+
111+
julia> similarity(hashfn) == cossim
112+
true
113+
114+
julia> hashtype(hashfn)
115+
$(cossim |> LSHFunction |> hashtype)
116+
```
117+
118+
Create a hash function for ``L^2`` distance in the function space ``L^2([0,2\\pi])``. Hash the functions `f(x) = cos(x)` and `f(x) = x/(2π)` using the returned `MonteCarloHash`.
119+
120+
```jldoctest; setup = :(using LSHFunctions, Random; Random.seed!(0))
121+
julia> μ() = 2π * rand(); # μ samples a random point from [0,2π]
122+
123+
julia> hashfn = MonteCarloHash(L2, μ, 3; volume=2π);
124+
125+
julia> hashfn(cos)
126+
3-element Array{Int32,1}:
127+
-1
128+
3
129+
0
130+
131+
julia> hashfn(x -> x/(2π))
132+
3-element Array{Int32,1}:
133+
-1
134+
-2
135+
-1
136+
```
137+
138+
Create a hash function with a different number of sample points.
139+
140+
```jldoctest; setup = :(using LSHFunctions; μ() = rand())
141+
julia> μ() = rand(); # Samples a random point from [0,1]
142+
143+
julia> hashfn = MonteCarloHash(cossim, μ; volume=1.0, n_samples=512);
144+
145+
julia> length(hashfn.sample_points)
146+
512
147+
```
148+
149+
See also: [`ChebHash`](@ref)
150+
"""
151+
MonteCarloHash(similarity, args...; kws...) =
152+
MonteCarloHash(SimilarityFunction(similarity), args...; kws...)
153+
63154
for (fn_space_simfn, simfn, p) in zip(_valid_MonteCarloHash_similarities...)
64155
quote
65156
# Add dispatch for case in which we specify the similarity function
66157
# to be $fn_space_simfn
67-
function MonteCarloHash(sim::SimilarityFunction{$fn_space_simfn}, μ, args...;
68-
n_samples::Int64=1024, volume=1.0, kws...)
158+
function MonteCarloHash(
159+
sim::SimilarityFunction{$fn_space_simfn},
160+
μ,
161+
args...;
162+
n_samples::Integer=_MONTECARLOHASH_DEFAULT_N_SAMPLES,
163+
volume=1.0,
164+
kws...)
69165

70166
discrete_hashfn = LSHFunction($simfn, args...; kws...)
71167
MonteCarloHash($fn_space_simfn, discrete_hashfn, μ, volume,

0 commit comments

Comments
 (0)