Disclaimer
This is a proof of concept.
DynamoDB is not a good fit for the job of text search. Adding data to the index is extremely expensive, and fuzzy searching is not possible.
For production full text searching, I recommend Elastic Search or Postgres text search.
Full text search without spinning up a pesky Elastic search cluster!
Search text: "say hello"
Search results:
- "Oh dear, 'Say hello John' she instructed him"
- "Deep dish lyrics: say hello, say hello"
- "We clap and say hello,. With our friends at storytime"
Create a DTS instance
export const dts = new DynamoTextSearch({
tableName: "DTS_example",
region: "ap-southeast-2",
// can be used if this table is shared with other entities
keyPrefix: "",
});
Define an index
export const myIndex: DTSIndex = {
name: "myIndex",
// optional, the larger this value the more partitions entities will be distributed across
// raise to increase read and write throughput, at the cost of more RCUs during searches
// default: 1
// numShards: 1,
// optional, specify custom characters to treat as a word boundary
// delimiters: " .,;",
// optional, specify characters to ignore from search input
// ignoreChars: " .,:;!@#$%^&*()-+=_'",
// optional, specify maximum number of searchable characters stored in each segment
// maxSearchableLength: 50
};
Load some entries
const dataItems = await loadMydata();
await Promise.all(
dataItems.map(dataItem =>
dts.addEntry({
index: myIndex,
entryText: dataItem.text,
entry: dataItem,
})
)
)
Perform a search
const searchResults = await dts.search({
index: myIndex,
searchText: "hello world",
// optional, return data pre-sorted by a key within 'entity'
sortKey: "dateCreated",
});
console.log(`Found ${searchResults.length} items`);
console.log(searchResults);
See below for examples
Create a table as per the configuration in infra/dynamo-text-search/main.tf
OR use terraform to auto create the table:
cd infra
terraform init
terraform apply
cd ../
I've included 2 test scripts, one to load the index with some data and another to perform a search. The Bible (not sure which version...) is included as test data
- Install dependencies
yarn install
- Load data
yarn example:load-bible
- Load as much data as you would like, then cancel the ingest with ctrl-c
- If you let this run, the whole bible will be ingested. With a small number of WCUs provisioned this may take a few minutes.
- Ingesting the entire example dataset to an on-demand table costs about $1.50 USD
- Search data
yarn example:search-bible "Abraham and Isaac"
Loading data can be slow. With a provisioned table at 10,000 WCUs I was able to ingest the bible in about 30 seconds. Loading data is also quiet expensive, loading this example data set costs about $1.50 USD
Querying data is extremely fast, Once the data has been loaded RCUs can be increased to allow up to numShards * 6,000
queries per second
There is no limit to the amount of data that can be stored in a single index, performance will remain constant with any size index.
Once data has been loaded, the number of shards cannot be changed - the data will have to be re-imported with a new index configuration.