Skip to content

Commit 1762ec8

Browse files
committed
edit: docs and readme
1 parent babf155 commit 1762ec8

File tree

5 files changed

+151
-40
lines changed

5 files changed

+151
-40
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@ built/
22
node_modules/
33
yarn.lock
44
docs/
5+
**/*.wiki

README.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# AQLqueryBuilder.js
22
> a typescript query builder for [arangodb](https://www.arangodb.com)'s [ArangoSearch](https://www.arangodb.com/docs/stable/arangosearch.html)
33
4+
##### !! warning !! experimental and unstable
5+
46
## overview
57
ArangoSearch provides a high-level API for interacting with Arango Search Views
68
through the Arango Query Language (AQL). This library aims to provide a query
@@ -47,11 +49,83 @@ provided, optional terms are considered required, so as not to retrieve all
4749
documents.
4850

4951
## setup
52+
53+
1) running generated AQL queries will require a working arangodb instance. in
54+
the future, it is hoped that this package can be imported and used in the
55+
`arangosh`, as well as client and server side. Currently there is only limited
56+
support for server-side use.
57+
5058
## installation
59+
60+
!! packaging and export behavior is not stable, and is likely to change
61+
!! significantly in the short-term
62+
1) clone this repository in your es6 compatible project.
63+
2) run `yarn install` from the project directory.
64+
5165
## usage
66+
for better documentation, run `yarn doc && serve docs/` from the project
67+
directory root.
68+
69+
AQLqueryBuilder aims to provide collection-agnostic and language-agnostic
70+
boolean search capabilities to the library's user. Currently, this library
71+
makes a number of assumptions about the way your data is stored and indexed,
72+
but these are hopefully compatible with a wide range of setups.
73+
74+
The primary assumption this library makes is that the data you are trying to
75+
query against is indexed by an ArangoSearch View, and that all documents index
76+
the same exact field. This field can be indexed by any number of analyzers,
77+
and the search be will run against all supplied collections simultaneously. This
78+
allows for true multi-language search provided that each collection is
79+
restricted to just one language and all documents index the same key as all
80+
other documents in the view. While there are plans to expand on this
81+
functionality to provide multi-key search, this library is primarily built for
82+
academic and textual searches, and is ideally suited for documents like books,
83+
articles, and other media where most of the data resides in a single place.
84+
85+
This works best as a document query tool. Leveraging ArangoSearch's built-in
86+
language stemming analyzers allows for complex search phrases to be run
87+
against any number of language-specific collections simultaneously.
88+
89+
For an example of a multi-lingual document ingest/parser, please see
90+
[ptolemy's curator](https://gitlab.com/HP4k1h5/nineveh/-/tree/master/ptolemy/dimitri/curator.js)
91+
92+
__Example:__
93+
```javascript
94+
import {buildAQL} from 'path/to/AQLqueryBuilder'
95+
const queryObject =
96+
{
97+
"view": "the_arango-search_view-name",
98+
"collections": [{
99+
"name": "collection_name",
100+
"analyzer": "analyzer_name"
101+
}],
102+
"query": "+'query string' -for +parseQuery ?to parse"
103+
}
104+
const aqlQuery = buildAQL(queryObject)
105+
// ... const cursor = await db.query(aqlQuery)
106+
```
107+
`collections` is an array of `collection` objects. This allows searching and
108+
filtering across collections impacted by the search.
109+
52110
### query object
53111

54112
`buildAQL` accepts an object with the following properties:
113+
114+
**view**: *string* (required): the name of the ArangoSearch view the query
115+
will be run against
116+
117+
**collections** (required): the names of the collections indexed by @view to query
118+
119+
**terms** (required): either an array of @term interfaces or a string to be
120+
parsed by @parseQuery
121+
122+
**key** (optional | default: "text"): the name of the Arango document key to search
123+
within.
124+
125+
**filters** (optional): a list of @filter interfaces
126+
127+
___
128+
55129
Example:
56130
```json
57131
{
@@ -63,18 +137,72 @@ Example:
63137
"analyzer_name"
64138
}
65139
],
140+
"key": "text",
66141
"query": "either a +query ?\"string for parseQuery to parse\"",
67142
"query": [
68143
{"type": "phr", "op": "?", "val": "\"or a list of query objects\""},
69144
{"type": "tok", "op": "-", "val": "tokens"}
70145
],
146+
"filters": [
147+
{
148+
"field": "field_name",
149+
"op": ">",
150+
"val": 0
151+
}
152+
],
71153
"limit":
72154
{
73155
"start": 0,
74156
"end": 20,
75157
}
76158
}
77159
```
160+
161+
### boolean search logic
162+
Quoting [mit's Database Search Tips](https://libguides.mit.edu/c.php?g=175963&p=1158594):
163+
> Boolean operators form the basis of mathematical sets and database logic.
164+
They connect your search words together to either narrow or broaden your
165+
set of results. The three basic boolean operators are: AND, OR, and NOT.
166+
167+
#### `+` AND
168+
* Mandatory terms and phrases. All results MUST INCLUDE these terms and
169+
* phrases.
170+
#### `?` OR
171+
* Optional terms and phrases. If there are ANDS or NOTS, these serve as
172+
* match score "boosters". If there are no ANDS or NOTS, ORS become required
173+
* in results.
174+
#### `-` NOT
175+
* Search results MUST NOT INCLUDE these terms and phrases. If a result that
176+
* would otherwise have matched, contains one or more terms or phrases, it
177+
* will not be included in the result set.
178+
78179
### default query syntax
180+
for more information on boolean search logic see
181+
[above](#boolean-search-logic)
182+
183+
The default syntax accepted by `AQLqueryBuilder`'s `query` object's `terms`
184+
key is as follows:
185+
186+
1) Everything inside single or double quotes is considered a `PHRASE`
187+
2) Everything else is considered a word to be analyzed by `TOKENS`
188+
3) Every individual search word and quoted phrase may be optionally prefixed
189+
by one of the following symbols `+ ? -`, or the plus-sign, the question-mark,
190+
and the minus-sign. If a word has no operator prefix, it is considered
191+
optional and is counted as an `OR`.
192+
193+
Example:
194+
input `one +two -"buckle my shoe"` and the queryParser will interpret as
195+
follows:
196+
197+
| | ANDS | ORS | NOTS |
198+
| - | - | - | - |
199+
| PHRASE | | | "buckle my shoe" |
200+
| TOKENS | two | one | |
201+
202+
The generated AQL query, when run will bring back only results that contain
203+
"two", that do not contain variations on the phrase "buckle my shoe", and that
204+
optionally contain "one". In this case, documents that contain "one" will be
205+
likely to score higher than those that do not.
206+
79207
## bugs
80208
## contributing

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"scripts": {
77
"test": "mocha -r ts-node/register",
88
"tests": "mocha -r ts-node/register 'tests/*.ts'",
9-
"doc": "typedoc src/"
9+
"doc": "typedoc --plugin none src/"
1010
},
1111
"homepage": "https://github.com/HP4k1h5/AQLqueryBuilder.js",
1212
"repository": {
@@ -35,6 +35,7 @@
3535
"mocha": "^8.0.1",
3636
"ts-node": "^8.10.2",
3737
"typedoc": "^0.17.7",
38+
"typedoc-plugin-markdown": "^2.3.1",
3839
"typescript": "^3.9.5"
3940
},
4041
"author": "HP4k1h5",

src/lib/structs.ts

Lines changed: 5 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,11 @@ export interface query {
1212
* */
1313
terms: term[] | string,
1414
/**
15+
* the name of the document key to search, must be the same across all
16+
* documents
17+
* */
18+
key?: string,
19+
/**
1520
* a list of @filter interfaces
1621
* */
1722
filters?: filter[],
@@ -28,33 +33,6 @@ export interface collection {
2833
analyzer: string,
2934
}
3035

31-
/**
32-
* @terms are the basic boolean search logic terms applied to ArangoSearch
33-
* Views via the Arango Query Language (AQL).
34-
* Example: {
35-
* ANDS: {anas:[], phrs: []},
36-
* }
37-
**/
38-
export interface terms {
39-
/**
40-
* Mandatory terms and phrases. All results MUST INCLUDE these terms and
41-
* phrases.
42-
**/
43-
ANDS?: term[],
44-
/**
45-
* Optional terms and phrases. If there are ANDS or NOTS, these serve as
46-
* match score "boosters". If there are no ANDS or NOTS, ORS become required
47-
* in results.
48-
**/
49-
ORS?: term[],
50-
/**
51-
* Search results MUST NOT INCLUDE these terms and phrases. If a result that
52-
* would otherwise have matched, contains one or more terms or phrases, it
53-
* will not be included in the result set.
54-
**/
55-
NOTS?: term[],
56-
}
57-
5836
export interface term {
5937
type: string,
6038
val: string,

src/search.ts

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ export function buildSearch(query: query): any {
1010
: query.terms
1111

1212
/* build boolean pieces */
13-
let ANDS = buildOPS(query.collections, query.terms, '+')
14-
let ORS = buildOPS(query.collections, query.terms, '?')
15-
let NOTS = buildOPS(query.collections, query.terms, '-')
13+
let ANDS = buildOPS(query.collections, query.terms, '+', query.key)
14+
let ORS = buildOPS(query.collections, query.terms, '?', query.key)
15+
let NOTS = buildOPS(query.collections, query.terms, '-', query.key)
1616

1717
/* handle combinations */
1818
if (ANDS && ORS) {
@@ -37,15 +37,16 @@ export function buildSearch(query: query): any {
3737
SORT TFIDF(doc) DESC`
3838
}
3939

40-
function buildOPS(collections: collection[], terms: term[], op: string): any {
40+
function buildOPS(collections: collection[], terms: term[], op: string, key:
41+
string = 'text'): any {
4142
const opWord: string = op == '+' ? ' AND ' : ' OR '
4243

4344
let queryTerms: any = terms.filter((t: term) => t.op == op)
4445
if (!queryTerms.length) return
4546

4647
/* phrases */
4748
let phrases = queryTerms.filter((qT: term) => qT.type == 'phr')
48-
.map((phrase: any) => buildPhrase(phrase, collections))
49+
.map((phrase: any) => buildPhrase(phrase, collections, key))
4950
if (!phrases.length) {
5051
phrases = undefined
5152
} else {
@@ -54,21 +55,21 @@ function buildOPS(collections: collection[], terms: term[], op: string): any {
5455

5556
/* tokens */
5657
let tokens = queryTerms.filter((qT: { type: string }) => qT.type === 'tok')
57-
tokens = tokens && buildTokens(tokens, collections)
58+
tokens = tokens && buildTokens(tokens, collections, key)
5859

5960
if (!phrases && !tokens) return
6061
if (op == '-') return { phrases, tokens }
6162
if (phrases && tokens) return aql.join([ phrases, tokens ], opWord)
6263
return (tokens || phrases)
6364
}
6465

65-
function buildPhrase(phrase: term, collections: collection[]): any {
66+
function buildPhrase(phrase: term, collections: collection[], key: string): any {
6667
return collections.map(coll => {
67-
return aql`PHRASE(doc.text, ${phrase.val.slice(1, -1)}, ${coll.analyzer})`
68+
return aql`PHRASE(doc${key}, ${phrase.val.slice(1, -1)}, ${coll.analyzer})`
6869
})
6970
}
7071

71-
function buildTokens(tokens: term[], collections: collection[]): any {
72+
function buildTokens(tokens: term[], collections: collection[], key: string): any {
7273
if (!tokens.length) return
7374

7475
const opWordMap = {
@@ -83,17 +84,19 @@ function buildTokens(tokens: term[], collections: collection[]): any {
8384
return a
8485
}, {})
8586

86-
const makeTokenAnalyzers = (tokens: term[], op: string, analyzer: string, field: string) => {
87+
const makeTokenAnalyzers = (tokens: term[], op: string, analyzer: string,
88+
key: string) => {
8789
return aql`
8890
ANALYZER(
8991
TOKENS(${tokens}, ${analyzer})
90-
${aql.literal(op)} IN doc.${field}, ${analyzer})`
92+
${aql.literal(op)} IN doc.${key}, ${analyzer})`
9193
}
9294

9395
let remapped = []
9496
collections.forEach(coll => {
9597
remapped.push(
96-
...Object.keys(mapped).map(op => makeTokenAnalyzers(mapped[ op ], op, coll.analyzer, 'text'))
98+
...Object.keys(mapped).map(op => makeTokenAnalyzers(mapped[ op ], op,
99+
coll.analyzer, key))
97100
)
98101
})
99102

0 commit comments

Comments
 (0)