Skip to content

Commit 83de6bb

Browse files
committed
Complete the renaming of the package from 'LSH.jl' to 'LSHFunctions.jl'.
The Julia General registry requires that modules names (a) are at least five letters long; and (b) end in a lowercase letter, except with admin approval. While it's possible that we'll eventually be able to alias 'LSH' to 'LSHFunctions' or some such, for now it seems that the best course of action is simply to rename the module to LSHFunctions. This will have the additional benefit of creating a naming scheme for future packages, e.g. LSHTables. Squashed commit of the following: commit 2f6284f Author: kernelmethod <17100608+kernelmethod@users.noreply.github.com> Date: Mon Jan 20 14:58:29 2020 -0700 Change from the LSH module to LSHFunctions in the documentation. commit 89dbc40 Author: kernelmethod <17100608+kernelmethod@users.noreply.github.com> Date: Mon Jan 20 14:30:07 2020 -0700 Remove remaining references to the LSH package/module, and replace them with references to the LSHFunctions package/module. commit 7531cb1 Author: kernelmethod <17100608+kernelmethod@users.noreply.github.com> Date: Mon Jan 20 14:27:25 2020 -0700 Remove usages of the LSH module / package in the tests, and replaced them with LSHFunctions. commit b754fc7 Author: kernelmethod <17100608+kernelmethod@users.noreply.github.com> Date: Mon Jan 20 14:19:04 2020 -0700 Change the register_similarity! macro to generate LSHFunctions.LSHFunction and LSHFunctions.lsh_family, rather than LSH.LSHFunction and LSHFunctions.lsh_family. commit f5b80bc Author: kernelmethod <17100608+kernelmethod@users.noreply.github.com> Date: Mon Jan 20 14:15:32 2020 -0700 Rename LSH.jl to LSHFunctions.jl.
1 parent 4a84db4 commit 83de6bb

28 files changed

+175
-177
lines changed

docs/make.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@ Pkg.activate(); Pkg.instantiate()
55

66
pushfirst!(LOAD_PATH, joinpath(@__DIR__, ".."))
77

8-
using Documenter, LSH
8+
using Documenter, LSHFunctions
99

1010
makedocs(
1111
sitename = "LSHFunctions.jl",
1212
format = Documenter.HTML(),
13-
modules = [LSH],
13+
modules = [LSHFunctions],
1414
pages = ["Home" => "index.md",
1515
"The LSHFunction API" => "lshfunction_api.md",
1616
"Similarity functions" => [

docs/src/faq.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ The reason for computing multiple hashes is that every LSH function provides (at
77

88
In fact, the situation can be much more dire than that. If your data are highly structured, it is likely that each of your hashes will place data points into a tiny handful of buckets -- even just one bucket. For instance, in the snippet below we have a dataset of 100 points that all have very high cosine similarity with one another. If we only create a single hash function when we call [`SimHash`](@ref), then it's very likely that all of the data points will have the same hash.
99

10-
```jldoctest; setup = :(using LSH, Random; Random.seed!(0))
10+
```jldoctest; setup = :(using LSHFunctions, Random; Random.seed!(0))
1111
julia> hashfn = SimHash();
1212
1313
julia> data = ones(10, 100); # Each column is a data point
@@ -24,7 +24,7 @@ julia> unique(hashes)
2424
The solution to this is to generate multiple hash functions, and combine each of the hashes we compute for an input into a single key. In the snippet below, we create 20 hash functions with [`SimHash`](@ref). Each hash computed in `map(x -> hashfn(x), eachcol(data))` is a length-20 `BitArray`.
2525

2626

27-
```jldoctest; setup = :(using LSH, Random; Random.seed!(0))
27+
```jldoctest; setup = :(using LSHFunctions, Random; Random.seed!(0))
2828
julia> hashfn = SimHash(20);
2929
3030
julia> data = ones(10,100); # Each column is a data point

docs/src/full_api.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,14 +29,14 @@ MIPSHash
2929
## Similarity functions
3030

3131
```@autodocs
32-
Modules = [LSH]
32+
Modules = [LSHFunctions]
3333
Private = false
3434
Pages = ["similarities.jl"]
3535
```
3636

3737
## Private interface
3838

3939
```@autodocs
40-
Modules = [LSH]
40+
Modules = [LSHFunctions]
4141
Public = false
4242
```

docs/src/lshfunction_api.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ LSHFunction(similarity, n_hashes::Integer=1; kws...)
1717
For instance, in the snippet below we create a single hash function corresponding to cosine similarity:
1818

1919
```jldoctest
20-
julia> using LSH
20+
julia> using LSHFunctions
2121
2222
julia> hashfn = LSHFunction(cossim);
2323
@@ -37,7 +37,7 @@ As another example, following code snippet creates 10 hash functions for inner p
3737
- `maxnorm`: an upper bound on the norm of the data points we're hashing, and a required parameter for [`SignALSH`](@ref).
3838

3939
```jldoctest
40-
julia> using LSH
40+
julia> using LSHFunctions
4141
4242
julia> hashfn = LSHFunction(inner_prod, 10; dtype=Float64, maxnorm=5.0);
4343
@@ -64,7 +64,7 @@ julia> hashfn.maxnorm
6464

6565
If you want to know what hash function will be created for a given similarity, you can use [`lsh_family`](@ref):
6666

67-
```jldoctest; setup = :(using LSH)
67+
```jldoctest; setup = :(using LSHFunctions)
6868
julia> lsh_family(jaccard)
6969
MinHash
7070
@@ -77,7 +77,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
7777

7878
- [`n_hashes`](@ref): returns the number of hash functions computed by an [`LSHFunction`](@ref).
7979

80-
```jldoctest; setup = :(using LSH)
80+
```jldoctest; setup = :(using LSHFunctions)
8181
julia> hashfn = LSHFunction(jaccard);
8282
8383
julia> n_hashes(hashfn)
@@ -96,7 +96,7 @@ julia> length(hashes)
9696

9797
- [`similarity`](@ref): returns the similarity function for which the input [`LSHFunction`](@ref) is locality-sensitive:
9898

99-
```jldoctest; setup = :(using LSH)
99+
```jldoctest; setup = :(using LSHFunctions)
100100
julia> hashfn = LSHFunction(cossim);
101101
102102
julia> similarity(hashfn)
@@ -105,7 +105,7 @@ cossim (generic function with 2 methods)
105105

106106
- [`hashtype`](@ref): returns the type of hash computed by the input hash function. Note that in practice `hashfn(x)` (or [`index_hash(hashfn,x)`](@ref) and [`query_hash(hashfn,x)`](@ref) for an [`AsymmetricLSHFunction`](@ref)) will return an array of hashes, one for each hash function you generated. [`hashtype`](@ref) is the data type of each element of `hashfn(x)`.
107107

108-
```jldoctest; setup = :(using LSH)
108+
```jldoctest; setup = :(using LSHFunctions)
109109
julia> hashfn = LSHFunction(cossim, 5);
110110
111111
julia> hashtype(hashfn)
@@ -122,7 +122,7 @@ true
122122

123123
- [`collision_probability`](@ref): returns the probability of collision for two inputs with a given similarity. For instance, the probability that a single MinHash hash function causes a collision between inputs `A` and `B` is equal to [`jaccard(A,B)`](@ref jaccard):
124124

125-
```jldoctest; setup = :(using LSH)
125+
```jldoctest; setup = :(using LSHFunctions)
126126
julia> hashfn = MinHash();
127127
128128
julia> A = Set(["a", "b", "c"]);
@@ -137,7 +137,7 @@ true
137137

138138
We often want to compute the probability that not just one hash collides, but that multiple hashes collide simultaneously. You can calculate this using the `n_hashes` keyword argument. If left unspecified, then [`collision_probability`](@ref) will use [`n_hashes(hashfn)`](@ref n_hashes) hash functions to compute the probability.
139139

140-
```jldoctest; setup = :(using LSH)
140+
```jldoctest; setup = :(using LSHFunctions)
141141
julia> hashfn = MinHash(5);
142142
143143
julia> A = Set(["a", "b", "c"]);

docs/src/similarities/cosine.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Concretely, cosine similarity is computed as
1313
where ``\left\langle\cdot,\cdot\right\rangle`` is an inner product (e.g., dot product) and ``\|\cdot\|`` is the norm derived from that inner product. ``\text{cossim}(x,y)`` goes from ``-1`` to ``1``, where ``-1`` corresponds to low similarity and ``1`` corresponds to high similarity. To calculate cosine similarity, you can use the [`cossim`](@ref) function exported from the `LSH` module:
1414

1515
```jldoctest
16-
julia> using LSH, LinearAlgebra
16+
julia> using LSHFunctions, LinearAlgebra
1717
1818
julia> x = [5, 3, -1, 1]; # norm(x) == 6
1919
@@ -29,7 +29,7 @@ true
2929
## SimHash
3030
*SimHash*[^1][^2] is a family of LSH functions for hashing with respect to cosine similarity. You can generate a new hash function from this family by calling [`SimHash`](@ref):
3131

32-
```jldoctest; setup = :(using LSH)
32+
```jldoctest; setup = :(using LSHFunctions)
3333
julia> hashfn = SimHash();
3434
3535
julia> n_hashes(hashfn)
@@ -43,7 +43,7 @@ julia> n_hashes(hashfn)
4343

4444
Once constructed, you can start hashing vectors by calling `hashfn(x)`:
4545

46-
```jldoctest; setup = :(using LSH, Random; Random.seed!(0)), output = false
46+
```jldoctest; setup = :(using LSHFunctions, Random; Random.seed!(0)), output = false
4747
hashfn = SimHash(100)
4848
4949
# x and y have high cosine similarity since they point in the same direction
@@ -65,7 +65,7 @@ true
6565

6666
Note that [`SimHash`](@ref) is a one-bit hash function. As a result, `hashfn(x)` returns a `BitArray`:
6767

68-
```jldoctest; setup = :(using LSH)
68+
```jldoctest; setup = :(using LSHFunctions)
6969
julia> hashfn = SimHash();
7070
7171
julia> n_hashes(hashfn)
@@ -82,7 +82,7 @@ julia> length(hashes)
8282

8383
Since a single-bit hash doesn't do much to reduce the cost of similarity search, you usually want to generate multiple hash functions at once. For instance, in the snippet below we sample 10 hash functions, so that `hashfn(x)` is a length-10 `BitArray`:
8484

85-
```jldoctest; setup = :(using LSH)
85+
```jldoctest; setup = :(using LSHFunctions)
8686
julia> hashfn = SimHash(10);
8787
8888
julia> n_hashes(hashfn)
@@ -101,10 +101,10 @@ The probability of a hash collision (for a single hash) is
101101
where ``\theta = \text{arccos}(\text{cossim}(x,y))`` is the angle between ``x`` and ``y``. This collision probability is shown in the plot below.
102102

103103
```@eval
104-
using PyPlot, LSH
104+
using PyPlot, LSHFunctions
105105
hashfn = SimHash()
106106
x = range(-1, 1; length=1024)
107-
y = [LSH.single_hash_collision_probability(hashfn, xii) for xii in x]
107+
y = [LSHFunctions.single_hash_collision_probability(hashfn, xii) for xii in x]
108108
109109
plot(x, y)
110110
title("Probability of hash collision for SimHash")

src/LSHBase.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Compute the probability of hash collision between two inputs with similarity `si
7979
# Examples
8080
The probability that a single MinHash hash function causes a hash collision between inputs `A` and `B` is equal to `jaccard(A,B)`:
8181
82-
```jldoctest; setup = :(using LSH)
82+
```jldoctest; setup = :(using LSHFunctions)
8383
julia> hashfn = MinHash();
8484
8585
julia> A = Set(["a", "b", "c"]);
@@ -95,7 +95,7 @@ julia> collision_probability(hashfn, jaccard(A,B); n_hashes=1)
9595
9696
If our [`MinHash`](@ref) struct keeps track of `N` hash functions simultaneously, then the probability of collision is `jaccard(A,B)^N`:
9797
98-
```jldoctest; setup = :(using LSH)
98+
```jldoctest; setup = :(using LSHFunctions)
9999
julia> hashfn = MinHash(10);
100100
101101
julia> A = Set(["a", "b", "c"]);
@@ -153,7 +153,7 @@ Computes the probability of a hash collision between two inputs `x` and `y` for
153153
# Examples
154154
The following snippet computes the probability of collision between two sets `A` and `B` for a single MinHash. For MinHash, this probability is just equal to the Jaccard similarity between `A` and `B`.
155155
156-
```jldoctest; setup = :(using LSH)
156+
```jldoctest; setup = :(using LSHFunctions)
157157
julia> hashfn = MinHash();
158158
159159
julia> A = Set(["a", "b", "c"]);
@@ -171,7 +171,7 @@ true
171171
172172
We can use the `n_hashes` argument to specify the probability that `n_hashes` MinHash hash functions simultaneously collide. If left unspecified, then we'll simply use `n_hashes(hashfn)` as the number of hash functions:
173173
174-
```jldoctest; setup = :(using LSH)
174+
```jldoctest; setup = :(using LSHFunctions)
175175
julia> hashfn = MinHash(10);
176176
177177
julia> A = Set(["a", "b", "c"]);
@@ -200,7 +200,7 @@ Returns the similarity function that `hashfn` hashes on.
200200
- `hashfn::AbstractLSHFunction`: the hash function whose similarity we would like to retrieve.
201201
202202
# Examples
203-
```jldoctest; setup = :(using LSH)
203+
```jldoctest; setup = :(using LSHFunctions)
204204
julia> hashfn = LSHFunction(cossim);
205205
206206
julia> similarity(hashfn) == cossim
@@ -220,7 +220,7 @@ function similarity end
220220
Returns the type of hash generated by a hash function.
221221
222222
# Examples
223-
```jldoctest; setup = :(using LSH)
223+
```jldoctest; setup = :(using LSHFunctions)
224224
julia> hashfn = LSHFunction(cossim);
225225
226226
julia> hashtype(hashfn)
@@ -240,7 +240,7 @@ function hashtype end
240240
Return the number of hashes computed by `hashfn`.
241241
242242
# Examples
243-
```jldoctest; setup = :(using LSH)
243+
```jldoctest; setup = :(using LSHFunctions)
244244
julia> hashfn = SimHash();
245245
246246
julia> n_hashes(hashfn)

src/LSH.jl renamed to src/LSHFunctions.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
module LSH
1+
module LSHFunctions
22

33
using Distributions, LinearAlgebra, SparseArrays
44

src/function_hashing/chebhash.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ ChebHash(similarity, args...; kws...) =
2727

2828
function ChebHash(::SimilarityFunction{S},
2929
args...;
30-
interval::RealInterval = LSH.@interval(-1 x 1),
30+
interval::RealInterval = LSHFunctions.@interval(-1 x 1),
3131
kws...) where S
3232

3333
discrete_hashfn = LSHFunction(S, args...; kws...)

src/hashes/lphash.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ Constructs a locality-sensitive hash for ``\\ell^$($power)`` distance (``\\|x -
114114
# Examples
115115
Construct an `$($hashfn)` with the number of hash functions you want to generate:
116116
117-
```jldoctest; setup = :(using LSH)
117+
```jldoctest; setup = :(using LSHFunctions)
118118
julia> hashfn = $($hashfn)(128);
119119
120120
julia> hashfn.power == $($power) &&
@@ -125,7 +125,7 @@ true
125125
126126
After creating a hash function, you can compute hashes with `hashfn(x)`:
127127
128-
```jldoctest; setup = :(using LSH)
128+
```jldoctest; setup = :(using LSHFunctions)
129129
julia> hashfn = $($hashfn)(20);
130130
131131
julia> x = rand(4);

src/hashes/lshfunction.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,15 @@ Register `hashfn` to the `LSH` module as the default locality-sensitive hash fun
2121
# Examples
2222
Create a custom implementation of cosine similarity called `my_cossim`, and associate it with `SimHash`:
2323
24-
```jldoctest; setup = :(using LSH)
24+
```jldoctest; setup = :(using LSHFunctions)
2525
julia> using LinearAlgebra: dot, norm
2626
2727
julia> my_cossim(x,y) = dot(x,y) / (norm(x) * norm(y));
2828
2929
julia> hashfn = LSHFunction(my_cossim);
3030
ERROR: MethodError: no method matching LSHFunction(::typeof(my_cossim))
3131
32-
julia> LSH.@register_similarity!(my_cossim, SimHash);
32+
julia> LSHFunctions.@register_similarity!(my_cossim, SimHash);
3333
3434
julia> hashfn = LSHFunction(my_cossim);
3535
@@ -38,8 +38,8 @@ true
3838
```
3939
"""
4040
macro register_similarity!(similarity, hashfn)
41-
lshfn = :(LSH.LSHFunction)
42-
lshfam = :(LSH.lsh_family)
41+
lshfn = :(LSHFunctions.LSHFunction)
42+
lshfam = :(LSHFunctions.lsh_family)
4343

4444
quote
4545
local similarity = $(esc(similarity))
@@ -113,7 +113,7 @@ Returns a subtype of `LSH.LSHFunction` that hashes the similarity function `simi
113113
# Examples
114114
In the snippet below, we construct `$(lsh_family(cossim))` (the default hash function corresponding to cosine similarity) using `LSHFunction()`:
115115
116-
```jldoctest; setup = :(using LSH)
116+
```jldoctest; setup = :(using LSHFunctions)
117117
julia> hashfn = LSHFunction(cossim);
118118
119119
julia> typeof(hashfn) <: $(lsh_family(cossim)) <: LSHFunction
@@ -122,7 +122,7 @@ true
122122
123123
We can provide arguments and keyword parameters corresponding to the hash function that we construct:
124124
125-
```jldoctest; setup = :(using LSH)
125+
```jldoctest; setup = :(using LSHFunctions)
126126
julia> hashfn = LSHFunction(inner_prod, 100; dtype=Float64, maxnorm=10);
127127
128128
julia> n_hashes(hashfn) == 100 &&
@@ -154,7 +154,7 @@ help?> SignALSH
154154
155155
# Examples
156156
157-
```jldoctest; setup = :(using LSH)
157+
```jldoctest; setup = :(using LSHFunctions)
158158
julia> lsh_family(cossim)
159159
SimHash
160160

0 commit comments

Comments
 (0)