Update docs.

kernelmethod · kernelmethod · commit b0dbbdbcf01b · 2020-02-09T13:04:01.000-07:00
diff --git a/docs/make.jl b/docs/make.jl
@@ -20,8 +20,9 @@ makedocs(
                     "Inner product similarity" => joinpath("similarities", "inner_prod.md")],
                 "Function-space hashing" => "function_hashing.md",
                 "Performance tips" => "performance.md",
+                "FAQ" => "faq.md",
+                "Notation and glossary" => "notation_and_glossary.md",
                 "API reference" => "full_api.md",
-                "FAQ" => "faq.md"
                ]
 )
 
diff --git a/docs/src/function_hashing.md b/docs/src/function_hashing.md
@@ -19,6 +19,33 @@ julia> hashfn(x -> 5x^3 - 2x^2 - 9x + 1)
  1
 ```
 
+## Similarity statistics in ``L_{\mu}^p`` function spaces
+
+### ``L_{\mu}^p`` distance
+
+```math
+\|f - g\|_{L_{\mu}^p} = \left(\int_{\Omega} |f(x) - g(x)|^p \hspace{0.15cm} d\mu(x)\right)^{1/p}
+```
+
+### Inner product similarity
+
+```math
+\left\langle f, g\right\rangle_{L_{\mu}^2} = \int_{\Omega} f(x)g(x) \hspace{0.15cm} d\mu(x)
+```
+
+When `f(x)` and `g(x)` are allowed to take on complex values, the inner product is defined as
+
+```math
+\left\langle f, g\right\rangle_{L_{\mu}^2} = \int_{\Omega} f(x)\overline{g(x)} \hspace{0.15cm} d\mu(x)
+```
+
+where ``\overline{g(x)}`` is the complex conjugate of ``g(x)``.
+
+### Cosine similarity
+```math
+\text{cossim}(f,g) = \frac{\left\langle f,g\right\rangle_{L_{\mu}^2}}{\|f\|_{L_{\mu}^2} \cdot \|g\|_{L_{\mu}^2}}
+```
+
 ## Function approximation-based hashing
 
 !!! warning "API subject to change"
@@ -99,11 +126,13 @@ julia> hashfn(x -> x/(2π))
 Create a hash function with a different number of sample points.
 
 ```jldoctest; setup = :(using LSHFunctions)
-julia> μ() = rand();  # Samples a random point from [0,1]
+julia> μ() = rand();  # μ samples a random point from [0,1]
 
 julia> hashfn = MonteCarloHash(cossim, μ; volume=1.0, n_samples=512);
 
 julia> length(hashfn.sample_points)
 512
 ```
 
+## Footnotes
+
diff --git a/docs/src/lshfunction_api.md b/docs/src/lshfunction_api.md
@@ -94,7 +94,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
   10
   ```
 
-- [`similarity`](@ref): returns the similarity function for which the input [`LSHFunction`](@ref) is locality-sensitive:
+- [`similarity`](@ref): returns the similarity statistic on which your hash function is locality-sensitive:
 
   ```jldoctest; setup = :(using LSHFunctions)
   julia> hashfn = LSHFunction(cossim);
@@ -103,7 +103,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
   cossim (generic function with 2 methods)
   ```
 
-- [`hashtype`](@ref): returns the type of hash computed by the input hash function. Note that in practice `hashfn(x)` (or [`index_hash(hashfn,x)`](@ref) and [`query_hash(hashfn,x)`](@ref) for an [`AsymmetricLSHFunction`](@ref)) will return an array of hashes, one for each hash function you generated. [`hashtype`](@ref) is the data type of each element of `hashfn(x)`.
+- [`hashtype`](@ref): returns the type of hash computed by the input hash function. Note that in practice `hashfn(x)` (or [`index_hash(hashfn,x)`](@ref) and [`query_hash(hashfn,x)`](@ref) for an [`AsymmetricLSHFunction`](@ref)) will return an array of hashes, one for each hash function you sampled when you called [`LSHFunction`](@ref). [`hashtype`](@ref) is the data type of each element of `hashfn(x)`.
 
   ```jldoctest; setup = :(using LSHFunctions)
   julia> hashfn = LSHFunction(cossim, 5);
@@ -135,7 +135,7 @@ LSHFunctions.jl provides a few common utility functions that you can use across
   true
   ```
 
-  We often want to compute the probability that not just one hash collides, but that multiple hashes collide simultaneously. You can calculate this using the `n_hashes` keyword argument. If left unspecified, then [`collision_probability`](@ref) will use [`n_hashes(hashfn)`](@ref n_hashes) hash functions to compute the probability.
+  We often want to compute the probability that not just one hash collides, but that multiple hashes collide simultaneously. You can calculate this using the `n_hashes` keyword argument. If left unspecified, this parameter will default to [`n_hashes(hashfn)`](@ref n_hashes).
 
   ```jldoctest; setup = :(using LSHFunctions)
   julia> hashfn = MinHash(5);
diff --git a/docs/src/notation_and_glossary.md b/docs/src/notation_and_glossary.md
@@ -0,0 +1,6 @@
+# LSHFunctions notation and glossary
+
+!!! warning "Under construction"
+    This section is currently being developed. If you're interested in helping write this section, feel free to [open a pull request](https://github.com/kernelmethod/LSHFunctions.jl/pulls); otherwise, please check back later.
+
+
diff --git a/docs/src/similarities/lp_distance.md b/docs/src/similarities/lp_distance.md
@@ -4,24 +4,14 @@
     This section is currently being developed. If you're interested in helping write this section, feel free to [open a pull request](https://github.com/kernelmethod/LSHFunctions.jl/pulls); otherwise, please check back later.
 
 ## Definition
-``\ell^p`` distance is a generalization of our usual notion of distance between a pair of points. If you're not familiar with it, you can think of it as a generalization of the Pythagorean theorem: if we have two points ``(a_1,b_1)`` and ``(a_2,b_2)``, then the distance between them is
-
-```math
-\text{distance} = \sqrt{(a_1 - b_1)^2 + (a_2 - b_2)^2}
-```
-
-This is known as the *``\ell^2`` distance* (or Euclidean distance) between ``(a_1,b_1)`` and ``(a_2,b_2)``. In higher dimensions, the ``\ell^2`` distance between the points ``x = (x_1,\ldots,x_n)`` and ``y = (y_1,\ldots,y_n)`` is denoted as ``\|x - y\|_{\ell^2}`` (since ``\ell^2`` distance, and, for that matter, all ``\ell^p`` distances of order ``\ge 1``, are [norms](https://en.wikipedia.org/wiki/Norm_(mathematics))) and defined as[^1]
-
-```math
-\|x - y\|_{\ell^2} = \sqrt{\sum_{i=1}^n \left|x_i - y_i\right|^2}
-```
-
-More generally, the ``\ell^p`` distance between the two length-``n`` vectors ``x`` and ``y`` is given by
+The ``\ell^p`` distance between two length-``n`` vectors ``x`` and ``y`` is defined as
 
 ```math
 \|x - y\|_{\ell^p} = \left(\sum_{i=1}^n \left|x_i - y_i\right|^p\right)^{1/p}
 ```
 
+``\ell^p`` distance is a valid [norm](https://en.wikipedia.org/wiki/Norm_(mathematics)) for all ``p \ge 1``, and is bounded between ``0`` and ``+\infty``. In the context of locality-sensitive hashing, we say that two points are similar if ``\|x - y\|_{\ell^p}`` is small, and dissimilar if ``\|x - y\|_{\ell^p}`` is large[^1].
+
 In the LSHFunctions module, you can calculate the ``\ell^p`` distance between two points using the function [`ℓp`](@ref). The functions [`ℓ1`](@ref ℓp) and [`ℓ2`](@ref ℓp) are also defined for ``\ell^1`` and ``\ell^2`` distance, respectively, since they're so commonly used:
 
 ```jldoctest
@@ -135,8 +125,8 @@ for scale in (0.25, 1.0, 4.0)
   y1 = [collision_probability(l1_hashfn, xii) for xii in x]
   y2 = [collision_probability(l2_hashfn, xii) for xii in x]
 
-  axes[1].plot(x, y1, label="\$r = $scale\$")
-  axes[2].plot(x, y2, label="\$r = $scale\$")
+  axes[1].plot(x, y1, label="\$scale = $scale\$")
+  axes[2].plot(x, y2, label="\$scale = $scale\$")
 end
 
 axes[1].set_xlabel(raw"$\|x - y\|_{\ell^1}$", fontsize=20)
@@ -159,7 +149,7 @@ For further information about the collision probability, see Section 3.2 of the
 
 ### Footnotes
 
-[^1]: In general, ``x`` and ``y`` are allowed to be complex vectors. We sum over ``\left|x_i - y_i\right|`` (the magnitude of ``x_i - y_i``) instead of ``(x_i - y_i)^2`` to guarantee that ``\|x - y\|_{\ell^2}`` is a real number even when ``x`` and ``y`` are complex.
+[^1]: "small" and "large" are relative terms, of course. `LpHash` has a parameter `scale` that influences the relationship between ``\ell^p`` distance and collision probability, which helps us differentiate between what distances are small and which are large.
 
 [^Datar04]: Datar, Mayur and Indyk, Piotr and Immorlica, Nicole and Mirrokni, Vahab. (2004). *Locality-sensitive hashing scheme based on p-stable distributions*. Proceedings of the Annual Symposium on Computational Geometry. 10.1145/997817.997857.
 

Original file line number	Diff line number	Diff line change
`@@ -20,8 +20,9 @@ makedocs(`
`20`	`20`	`"Inner product similarity" => joinpath("similarities", "inner_prod.md")],`
`21`	`21`	`"Function-space hashing" => "function_hashing.md",`
`22`	`22`	`"Performance tips" => "performance.md",`
	`23`	`+ "FAQ" => "faq.md",`
	`24`	`+ "Notation and glossary" => "notation_and_glossary.md",`
`23`	`25`	`"API reference" => "full_api.md",`
`24`		`- "FAQ" => "faq.md"`
`25`	`26`	`]`
`26`	`27`	`)`
`27`	`28`