-
Notifications
You must be signed in to change notification settings - Fork 7
[CV2-6075] documentation for Check-Alegre-Presto media indexing and similarity workflows #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# How Check Processes Items For Similarity | ||
|
||
## Images | ||
|
||
 | ||
[Edit Link](https://docs.google.com/drawings/d/1jXgbM_06rlpPeip1vxUKpiRYyhumrkFlr-2EC3qBxHg/edit) | ||
|
||
At a high level, Check-API receives new `ProjectMedia` items and, as they are created, we perform the following procedures: | ||
|
||
1. Store the `ProjectMedia`, | ||
2. Send the `ProjectMedia` through `Bot::Alegre.run`, | ||
3. `Bot::Alegre.run` searches for items via image hashing, | ||
4. Matches are simultaneously checked asynchronously for suggested and confirmed items, | ||
5. Once both queries are completed, we process the item in a callback, and store results in `Bot::Alegre.relate_project_media_callback`, | ||
6. We also store OCR text annotations in the text index for the item, which will match subsequent items (but does *not* match against existing items for the OCR attempt), | ||
7. OCR data is exhausted into OpenSearch for text lookups, and Image Hashes are exhausted into Postgres for image lookups on Alegre, | ||
8. Relationships at the Check API level are persisted after (5). | ||
|
||
## Video | ||
|
||
 | ||
[Edit Link](https://docs.google.com/drawings/d/1HQTwHmkhzp-J742-QAowfYMNaYoALYTPwOTA-PASHnk/edit) | ||
|
||
For video, Check-API receives new `ProjectMedia` items and, as they are created, performs the following procedures: | ||
|
||
1. Store the `ProjectMedia`, | ||
2. Send the `ProjectMedia` through `Bot::Alegre.run`, | ||
3. `Bot::Alegre.run` searches for items via video fingerprinting, | ||
4. Matches are simultaneously checked asynchronously for suggested and confirmed items, | ||
DGaffney marked this conversation as resolved.
Show resolved
Hide resolved
|
||
5. Once both queries are completed, we process the item in a callback, and store results in `Bot::Alegre.relate_project_media_callback`, | ||
6. We also store transcription text annotations in the text index for the item, which will match subsequent items (but does *not* match against existing items for the transcription attempt), | ||
7. Transcription data is exhausted into OpenSearch for text lookups, and Video Hashes are exhausted into Postgres for video lookups on Alegre, as well as on a disk lookup for .tmk file lookups, | ||
8. Relationships at the Check API level are persisted after (5). | ||
|
||
## Text | ||
|
||
 | ||
[Edit Link](https://docs.google.com/drawings/d/12WljT8-qsUi8xG584clD_eV1ABOcB6CqkMX0eAxSPrE/edit) | ||
|
||
1. Store the `ProjectMedia`, | ||
2. Send the `ProjectMedia` through `Bot::Alegre.run`, | ||
3. `Bot::Alegre.run` searches for items via video fingerprinting, | ||
DGaffney marked this conversation as resolved.
Show resolved
Hide resolved
|
||
4. Matches are simultaneously checked asynchronously for suggested and confirmed items *for `original_title` and `original_description`*, *for all vector models applied*, | ||
5. Once both queries are completed *for both fields*, *for all vector models applied*, we process the item in a callback, and store results in `Bot::Alegre.relate_project_media_callback` - that method tracks remaining messages and only processes a match when all messages are no longer in flight, | ||
6. Relationships at the Check API level are persisted after (5). | ||
|
||
## Audio | ||
|
||
 | ||
[Edit Link](https://docs.google.com/drawings/d/1YwWJMgPxAlonCdq4M5RWaSOzSSucwHkg7EWTggWOhw8/edit) | ||
|
||
1. Store the `ProjectMedia`, | ||
2. Send the `ProjectMedia` through `Bot::Alegre.run`, | ||
3. `Bot::Alegre.run` searches for items via video fingerprinting, | ||
4. Matches are simultaneously checked asynchronously for suggested and confirmed items, | ||
DGaffney marked this conversation as resolved.
Show resolved
Hide resolved
|
||
5. Once both queries are completed, we process the item in a callback, and store results in `Bot::Alegre.relate_project_media_callback`, | ||
6. We also store transcription text annotations in the text index for the item, which will match subsequent items (but does *not* match against existing items for the transcription attempt), | ||
7. Transcription data is exhausted into OpenSearch for text lookups, and Audio Hashes are exhausted into Postgres for audio lookups on Alegre | ||
8. Relationships at the Check API level are persisted after (5). |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This diagram is a helpful overview! I think the parts in 5.x are different now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's super out of date, and I have no idea where the source even is. Is this really worth reviving as a doc? |
Uh oh!
There was an error while loading. Please reload this page.