Skip to content

Commit e9cc88e

Browse files
committed
Update docs.
1 parent 2426638 commit e9cc88e

File tree

5 files changed

+44
-18
lines changed

5 files changed

+44
-18
lines changed

docs/make.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ makedocs(
2121
"Function-space hashing" => "function_hashing.md",
2222
"Performance tips" => "performance.md",
2323
"FAQ" => "faq.md",
24-
"Notation and glossary" => "notation_and_glossary.md",
24+
"Glossary" => "glossary.md",
2525
"API reference" => "full_api.md",
2626
]
2727
)

docs/src/function_hashing.md

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ julia> hashfn(x -> 5x^3 - 2x^2 - 9x + 1)
1919
1
2020
```
2121

22-
## Similarity statistics in ``L_{\mu}^p`` function spaces
22+
## Similarity statistics in function spaces
23+
The LSHFunctions module currently supports the following similarity statistics for function spaces. Unless otherwise stated, all functions are assumed to be members of an [``L^p_{\mu}(\Omega)`` function space](https://en.wikipedia.org/wiki/Lp_space).
2324

2425
### ``L_{\mu}^p`` distance
2526

@@ -33,13 +34,7 @@ julia> hashfn(x -> 5x^3 - 2x^2 - 9x + 1)
3334
\left\langle f, g\right\rangle_{L_{\mu}^2} = \int_{\Omega} f(x)g(x) \hspace{0.15cm} d\mu(x)
3435
```
3536

36-
When `f(x)` and `g(x)` are allowed to take on complex values, the inner product is defined as
37-
38-
```math
39-
\left\langle f, g\right\rangle_{L_{\mu}^2} = \int_{\Omega} f(x)\overline{g(x)} \hspace{0.15cm} d\mu(x)
40-
```
41-
42-
where ``\overline{g(x)}`` is the complex conjugate of ``g(x)``.
37+
When ``f`` and ``g`` are allowed to take on complex values, ``g(x)`` is replaced by ``\overline{g(x)}`` (the complex conjugate of ``g(x)``) in the formula above.
4338

4439
### Cosine similarity
4540
```math

docs/src/glossary.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# LSHFunctions notation and glossary
2+
3+
!!! warning "Under construction"
4+
This section is currently being developed. If you're interested in helping write this section, feel free to [open a pull request](https://github.com/kernelmethod/LSHFunctions.jl/pulls); otherwise, please check back later.
5+
6+
## Terms
7+
8+
---
9+
10+
**LSH**: an acronym for locality-sensitive hashing.
11+
12+
---
13+
14+
``L^p_{\mu}(\Omega)`` **function space** ([wikipedia](https://en.wikipedia.org/wiki/Lp_space)): a set of functions[^1] whose inputs come from some set ``\Omega`` and whose outputs are either real or complex numbers. ``\mu`` is a [measure](https://en.wikipedia.org/wiki/Measure_space) and ``p`` is a positive number. The ``L^p_{\mu}(\Omega)`` [norm](https://en.wikipedia.org/wiki/Norm_(mathematics)), denoted with ``\|\cdot\|_{L^p_{\mu}}`` (where ``\Omega`` is implicit), is defined as
15+
16+
```math
17+
\|f\|_{L^p_{\mu}} = \left(\int_{\Omega} \left|f(x)\right|^p \hspace{0.15cm} d\mu(x)\right)^{1/p}
18+
```
19+
20+
In the case where ``p = 2``, there is also an [inner product](https://en.wikipedia.org/wiki/Inner_product_space) defined for the space:
21+
22+
```math
23+
\left\langle f, g\right\rangle = \int_{\Omega} f(x)\overline{g(x)} \hspace{0.15cm} d\mu(x)
24+
```
25+
26+
where ``\overline{g(x)}`` is the complex conjugate of ``g(x)``. A function in ``L^p_{\mu}(\Omega)`` must have the property that ``\|f\|_{L^p_{\mu}}`` is finite.
27+
28+
*Example*: ``f(x) = x^2 - 3x + 2`` is a function in ``L^2([-1,1])`` (with ``\mu`` chosen to be [Lebesgue measure](https://en.wikipedia.org/wiki/Lebesgue_measure)) because ``\|f\|_{L^2} = \sqrt{\int_{-1}^1 \left|f(x)\right|^2 \hspace{0.15cm} dx}`` is finite. However, it is *not* a function in ``L^2([-\infty,\infty])`` because the ``\|f\|_{L^2} = \sqrt{\int_{-\infty}^{\infty} \left|f(x)\right|^2 \hspace{0.15cm} dx`` is infinite.
29+
30+
---
31+
32+
**Similarity statistic**: a number that represents the similarity between two data points. Different similarity statistics have different ways of defining what "similar" means.
33+
34+
A similarity statistic can be interpreted in many different ways; for instance, cosine similarity is defined between -1 and 1, with *higher* values indicating *higher* similarity. Meanwhile, ``\ell^p`` distance is defined between 0 and ``+\infty``, with *higher* distances indicating *lower* similarity.
35+
36+
## Footnotes
37+
[^1]: technically, equivalence classes of functions.

docs/src/notation_and_glossary.md

Lines changed: 0 additions & 6 deletions
This file was deleted.

docs/src/similarities/cosine.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ true
2929
```
3030

3131
## SimHash
32-
*SimHash*[^1][^2] is a family of LSH functions for hashing with respect to cosine similarity. You can generate a new hash function from this family by calling [`SimHash`](@ref):
32+
*SimHash*[^1][^Charikar02] is a family of LSH functions for hashing with respect to cosine similarity. You can generate a new hash function from this family by calling [`SimHash`](@ref):
3333

3434
```jldoctest; setup = :(using LSHFunctions)
3535
julia> hashfn = SimHash();
@@ -123,6 +123,6 @@ savefig("simhash_collision_probability.svg")
123123

124124
### Footnotes
125125

126-
[^1]: Moses S. Charikar. *Similarity estimation techniques from rounding algorithms*. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC '02, page 380–388, New York, NY, USA, 2002. Association for Computing Machinery. 10.1145/509907.509965.
126+
[^1]: [`SimHash` API reference](@ref SimHash)
127127

128-
[^2]: [`SimHash` API reference](@ref SimHash)
128+
[^Charikar02]: Moses S. Charikar. *Similarity estimation techniques from rounding algorithms*. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC '02, page 380–388, New York, NY, USA, 2002. Association for Computing Machinery. 10.1145/509907.509965.

0 commit comments

Comments
 (0)