Skip to content

Conversation

@sidferreira
Copy link
Contributor

@sidferreira sidferreira commented Apr 5, 2021

Takes over #623

[SQLiteAdapter] Added support for `Full Text Search` for SQLite adapter:
Add `isSearchable` boolean flag to schema column descriptor for creating `Full Text Search`-able columns
Add Q.textMatches(value) that compiles to match 'value' SQL for performing `Full Text Search` using SQLite adpater

Query example:

Q.where('searchable', Q.textMatches('hello world'))

Tasks:

  • Add to docs
  • Add to changelog
  • migration error
  • encodeMatcher
  • _fts_NAME
  • CI

@sidferreira
Copy link
Contributor Author

@radex Took some time, rebased the code, fixed some test issues, it is technically working, but I'm not sure if this is the best implementation, my suggestions are:

SELECT * FROM mail WHERE rowid = 15;                -- Fast. Rowid lookup.
SELECT * FROM mail WHERE body MATCH 'sqlite';       -- Fast. Full-text query.
SELECT * FROM mail WHERE mail MATCH 'search';       -- Fast. Full-text query.
SELECT * FROM mail WHERE rowid BETWEEN 15 AND 20;   -- Fast. Rowid lookup.
SELECT * FROM mail WHERE subject = 'database';      -- Slow. Linear scan.
SELECT * FROM mail WHERE subject MATCH 'database';  -- Fast. Full-text query.

@henrymoulton
Copy link
Contributor

Hey @sidferreira I'm thinking it might be useful to explain what type of apps might benefit from FTS in the docs (your code alludes to an email client) and also any implications of FTS (bigger db etc).

I'm also curious, are lower-end Android devices able to perform well doing FTS on a large inbox of email/notes/text? Or should an in-depth FTS query be pushed to the cloud?

@sidferreira
Copy link
Contributor Author

Hi @henrymoulton!
So, the FTS creates a virtual table that allows really fast text search, allowing the text to be found in any column (no need to specify).
The examples mentioned are from SQLite website, not real cases for me.

@sidferreira
Copy link
Contributor Author

No exact ideas on device performance, I believe we need to build to understand.
Of course that whatever we send to the cloud, makes apps lighter, but won't work offline...

@henrymoulton
Copy link
Contributor

Cool I'm keen to look into the performance aspect!

@sidferreira
Copy link
Contributor Author

So, after @henrymoulton comments, I decided to take a test on FTS:

I did a ridiculous table and queries on my computer.
The table has 2 columns:

  • Title, with 1654 chars,
  • Subtitle, with 1628 chars

Added 1000 records and did these queries:

SELECT * FROM test1 WHERE title LIKE "%500%" OR subtitle LIKE "%500%"; -- 40ms, this is just a common table createdfor comparisson
SELECT * FROM test2; -- 5000ms
SELECT * FROM test2 WHERE title LIKE "%500%"; -- 49ms, returned the record with `1500`
SELECT * FROM test2 WHERE title LIKE "%500%"; -- 49ms, returned the record with `1500`
SELECT * FROM test2 WHERE test2 MATCH "1001"; -- 28ms, returned the record with `1500`
SELECT * FROM test2 WHERE test2 MATCH "%001"; -- returned NOTHING

In short, this approach has limits on full word search, and I wonder how big the amount of data must be to have a bigger advantage.

I tested both FTS3 and FTS5,

cc @radex

@henrymoulton
Copy link
Contributor

Awesome @sidferreira I'm keen to try and replicate on a 2014 Android

@rcnnnghm
Copy link

rcnnnghm commented Apr 9, 2021

@sidferreira I'm no expert :-) but I think using % depends on the tokenizer used, see https://www.sqlite.org/fts5.html#full_text_query_syntax

@sidferreira
Copy link
Contributor Author

@rcnnnghm thanks for bringing this up!

I'm no expert either, just taking over someone's work.

@radex radex self-assigned this Apr 12, 2021
@radex
Copy link
Collaborator

radex commented Apr 12, 2021

@sidferreira Looks like there are some CI failures. You should be able to replicate locally by running yarn ci:check, yarn test:ios, yarn test:android

@radex radex assigned sidferreira and unassigned radex Apr 12, 2021
@sidferreira
Copy link
Contributor Author

@radex when you have time, check my questions at #984 (comment)

@sidferreira sidferreira requested a review from radex April 22, 2021 18:12
@radex
Copy link
Collaborator

radex commented May 7, 2021

isFTS instead of isSearchable

Makes sense 👍 In my own research, FTS is a pretty nasty problem and we can't expect different implementations to work the same way. For example, I don't expect that Loki would actually get a FTS implementation that would work the same way as SQLite's. Instead, it would likely get some different extended search functionality with different names and semantics.

ftsMatch instead of textMatches

👍

create Q.unsafeIn to allow usage of subqueries (assuming is doable) and avoid

can you elaborate? I'm not sure what you're proposing

change the current command so one can use examples from https://sqlite.org/fts3.html:

again, not sure what you're proposing here. AFAICT, only MATCH interests us. Other options you listed can be done using standard Q syntax

@sidferreira
Copy link
Contributor Author

@radex the idea was to add subquery capabilities on Watermelon, but that would add too many moving parts to this pr. So ignore the Q.unsafeIn part.

Copy link
Collaborator

@radex radex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code you submitted essentially looks good, aside from a few minor nitpicks.

But there's some code you didn't submit that only now I realized is missing — I'm sorry I didn't give you this feedback before...

https://github.com/Nozbe/WatermelonDB/blob/master/src/observation/encodeMatcher/canEncode.js#L7 - it's not really possible to do in-JS execution of a FTS query with matching semantics, so we need to return false here. I think this will require you to set a property on QueryDescription that says something like .hasFTS = true (otherwise, we'd need to walk the whole object tree, which is a no no)

https://github.com/Nozbe/WatermelonDB/blob/master/src/observation/encodeMatcher/test.js#L24 - here's a test for that

https://github.com/Nozbe/WatermelonDB/blob/master/src/adapters/lokijs/worker/encodeQuery/test.js#L374 - here, add a test to verify that LokiJS query encoder throws an error

https://github.com/Nozbe/WatermelonDB/blob/master/src/__tests__/databaseTests.js#L776 - this is the most important missing piece. We're checking that encoders spit out the right string but we're not actually testing anywhere that FTS works correctly! Here, look at other tests above for inspiration. Add a query for tasks records, and raw records that should match or not match the query. Be sure to add skipLoki: true, skipMatcher: true. You'll also need to modify the schema used by this test here: https://github.com/Nozbe/WatermelonDB/blob/master/src/adapters/__tests__/helpers.js#L60

https://github.com/Nozbe/WatermelonDB/blob/master/docs-master/Query.md#conditions-with-other-operators We need some quick documentation. A few words will do.

const addIndex = encodeIndex(column, table)

if (column.isFTS) {
logger.warn('[DB][Worker] Support for migrations and isFTS is still to be implemented')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, that's pretty bad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sidferreira sidferreira requested a review from radex May 14, 2021 19:53
Copy link
Collaborator

@radex radex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes look good but I'm still missing the database test :/ #984 (review)

Co-authored-by: Radek Pietruszewski <radexpl@gmail.com>
@sidferreira
Copy link
Contributor Author

Your changes look good but I'm still missing the database test :/ #984 (review)

my bad!

@radex
Copy link
Collaborator

radex commented May 21, 2021

@sidferreira Hey, do you think you'll have time in the next week to finish this? I'm working hard on many different changes around queries and adapters this and next week, so it'd be very convenient for me to integrate this asap to avoid later conflicts

@sidferreira
Copy link
Contributor Author

@radex I'm trying to set some time aside to finish this up, but I'm with this issue in my work and it is draining my time (actually related to 🍉 and 🤖 )
Also, I start wondering if we should have someone with real usage to help to keep this up.

@sidferreira sidferreira requested a review from radex May 26, 2021 17:24
@mlecoq mlecoq mentioned this pull request Jun 22, 2021
@stale
Copy link

stale bot commented Apr 16, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the wontfix This will not be worked on label Apr 16, 2022
@bumpingChris
Copy link

@sidferreira @radex Hi all! Thanks for the PR but is there a reason it's still not merged?
Regards,
Chris

@stale stale bot removed the wontfix This will not be worked on label Dec 5, 2022
@radex
Copy link
Collaborator

radex commented Jan 25, 2023

@sidferreira Hey, any reason for the closed PRs? I think they were definitely useful, just need more work

@sidferreira
Copy link
Contributor Author

@radex I'm moved to another company and won't be able to resume any work anytime soon. I was cleaning up my GitHub. I'll keep open in case someone can take over.

@sidferreira sidferreira reopened this Jan 25, 2023
@radex
Copy link
Collaborator

radex commented Jan 25, 2023

@sidferreira I completely understand, but I'd appreciate keeping this and #1271 open, I definitely hope to have this merged eventually :)

@radex
Copy link
Collaborator

radex commented Feb 4, 2023

superseded by #1163

@radex radex closed this Feb 4, 2023
@mokabiwd
Copy link

mokabiwd commented May 3, 2024

Hello @radex,

Have you any plan to merge this PR ?
We're planning to use watermelondb for offline searching of a catalog of around 10,000 products in a react native project, and FTS would be very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants