Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
77df891
Merge branch 'develop' into 'master'
mhamzawey Mar 12, 2019
46f7617
Merge branch 'master' of gitlab.com:meedan/alegre
infojunkie Mar 22, 2019
960d92d
Merge branch 'develop'
infojunkie Mar 22, 2019
802a226
Merge branch 'develop'
infojunkie Jun 20, 2019
7aa2402
merge develop into master locally
mhamzawey Jun 23, 2019
1f47310
Merging develop to master
huslage Dec 21, 2019
29b8fd0
Re-adding build for live to gitlab ci
huslage Dec 21, 2019
e406933
Merging with develop
huslage Dec 21, 2019
12a24ad
Don't build plugin for production
huslage Dec 21, 2019
045fea3
Merge branch 'develop'
infojunkie Mar 10, 2020
e41cb83
Merge develop
infojunkie Mar 31, 2020
6050d81
Merge branch 'develop'
infojunkie Apr 2, 2020
5200407
Security advisories
infojunkie Apr 3, 2020
407528b
Merge branch 'develop'
infojunkie Apr 4, 2020
4fe378e
Merge branch 'develop'
infojunkie Apr 5, 2020
672525b
Merge branch 'develop'
infojunkie Apr 7, 2020
1d039c0
Merge branch 'develop'
infojunkie May 1, 2020
0fce7a9
Retry on SELECT #8176
infojunkie May 4, 2020
ccc7e26
Merge branch 'develop'
infojunkie May 11, 2020
48f4781
Merge branch 'develop'
infojunkie May 20, 2020
988acf7
Merge branch 'develop'
infojunkie Jun 23, 2020
4bf1422
Merge branch 'develop'
caiosba Oct 6, 2020
b98b69d
Merge branch 'develop'
caiosba Nov 3, 2020
4554681
Merge branch 'develop'
caiosba Nov 17, 2020
d30b9bd
Merge branch 'develop'
caiosba Jan 29, 2021
c9b2787
Merge deploy hotfix (#51)
sonoransun Jan 29, 2021
ed5b2a3
Revert "Merge deploy hotfix (#51)"
Jan 29, 2021
c0da1d3
Fix typo in ecs deploy command referencing QA container for Live depl…
sonoransun Jan 29, 2021
f1fcaef
Merge branch 'develop'
caiosba Feb 22, 2021
6fee1aa
Revert "Also deploy the SBERT task in Live environment. (#57)"
caiosba Feb 22, 2021
19575ab
Merge branch 'develop'
caiosba Mar 17, 2021
53aef52
Merge branch 'develop'
caiosba Mar 31, 2021
e1c54dc
Merge branch 'develop'
caiosba May 17, 2021
9a3bd3a
Merge branch 'develop'
caiosba May 26, 2021
0126df0
Merge branch 'develop'
caiosba May 28, 2021
6de35ac
Merge branch 'develop'
caiosba May 28, 2021
4929206
Merge branch 'develop'
caiosba Jun 1, 2021
d9b0695
Merge branch 'develop'
caiosba Jun 2, 2021
e75fa67
Merge branch 'develop'
caiosba Jun 2, 2021
fb97b5f
Merge branch 'develop'
caiosba Jun 3, 2021
d8defb6
Merge branch 'develop'
caiosba Jun 15, 2021
bf9ec70
CHECK-688: Only return structure if ocr content detected
amoedoamorim Jun 16, 2021
310c5bc
CHECK-688: Add test case for images without text
amoedoamorim Jun 16, 2021
935badd
Merge branch 'develop'
caiosba Jun 30, 2021
eb79eed
Merge branch 'develop'
caiosba Jul 21, 2021
414517e
Merge branch 'develop'
caiosba Aug 4, 2021
74023c7
Merge branch 'develop'
caiosba Aug 24, 2021
33c650f
Merge branch 'develop'
caiosba Aug 25, 2021
8011f4e
Merge branch 'develop'
caiosba Sep 2, 2021
12225ac
Merge branch 'develop'
caiosba Sep 23, 2021
7186de7
Merge branch 'develop'
caiosba Sep 27, 2021
fe9e6d7
Merge branch 'develop'
caiosba Sep 28, 2021
8b6700c
Merge branch 'develop'
caiosba Oct 5, 2021
5b752d8
Merge branch 'develop'
caiosba Nov 3, 2021
a663ea7
Merge branch 'develop'
caiosba Nov 4, 2021
4844f73
Merge branch 'develop'
caiosba Nov 4, 2021
5f7962b
Merge branch 'develop'
caiosba Nov 5, 2021
9cb087b
Merge branch 'develop'
caiosba Jan 26, 2022
81d5742
Merge branch 'develop'
caiosba Feb 1, 2022
bfbca0b
Merge branch 'develop'
caiosba Feb 3, 2022
57c99cb
Merge branch 'develop'
Feb 7, 2022
9d2ad1d
CHECK-1618 fix context issues for videos (#217)
DGaffney Mar 18, 2022
5fcf872
CHECK-1618 attempt to resolve context matching failures (#221)
DGaffney Mar 24, 2022
2e859f8
Merge branch 'develop'
caiosba Apr 20, 2022
f79f23c
Merge branch 'develop'
caiosba Apr 21, 2022
7ac1bcb
Merge branch 'develop'
caiosba May 10, 2022
d308f8e
Merge branch 'develop'
caiosba Jun 1, 2022
afefb75
Merge branch 'develop'
caiosba Jun 16, 2022
0adfd67
CHECK-2045 convert to setexes (#240)
DGaffney Jun 27, 2022
37f9db3
Merge branch 'develop'
caiosba Jul 13, 2022
8a97230
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Jul 19, 2022
0f31929
Merge branch 'develop'
caiosba Aug 16, 2022
e032817
Use Git Repo Sync action to update GitLab repository for CI/CD. (#252)
sonoransun Aug 17, 2022
f93a790
Merge branch 'develop'
caiosba Sep 9, 2022
1e93d93
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Sep 12, 2022
786456b
Merge branch 'develop'
caiosba Oct 5, 2022
5e8b201
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Oct 6, 2022
4e041d1
Merge branch 'develop'
caiosba Oct 28, 2022
1910c30
Merge branch 'develop'
caiosba Nov 8, 2022
11d7efe
Merge branch 'develop'
caiosba Nov 24, 2022
adbff42
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Jan 19, 2023
7c2f6d7
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Feb 13, 2023
2210d2f
CV2-2805 (#284)
computermacgyver Feb 23, 2023
872fe24
Merge branch 'develop'
caiosba Mar 23, 2023
c480d35
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Apr 28, 2023
c0f97d8
Hotfix
DGaffney May 4, 2023
1bad5de
Fix vector query
DGaffney May 4, 2023
7f424f0
Merge branch 'develop'
caiosba May 10, 2023
c323f24
CV2-3155 add retry to updates (#309)
DGaffney May 30, 2023
6f78941
CV2-3164: Adding `per_model_threshold` to `body` returned by `get_bod…
ahmednasserswe Jun 2, 2023
aa0b629
CV2-3233 minor tweak to fix intermittent ssl error (#311)
DGaffney Jun 12, 2023
0f566f2
[CV2-3249] track error (#313)
DGaffney Jun 15, 2023
381f5d2
update chromaprint ffmpeg reader (#315)
DGaffney Jun 16, 2023
18d7fc7
CV2-3263 update to bullseye
DGaffney Jun 20, 2023
6d0563d
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Jun 21, 2023
1440acf
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Jun 28, 2023
780565c
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Jul 6, 2023
84dd130
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Jul 10, 2023
3776550
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Jul 12, 2023
0fe4b57
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Jul 12, 2023
f7d8ffb
Merge branch 'develop'
caiosba Jul 19, 2023
a4f09e9
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Aug 14, 2023
2d9b5e5
Merge branch 'develop'
caiosba Oct 11, 2023
3decb03
Merge branch 'develop'
caiosba Oct 11, 2023
8b1bff5
Merge branch 'develop'
caiosba Nov 29, 2023
e7a4bbd
Fixing conflict
caiosba Jan 17, 2024
1fb915c
Deprecate unused model services in ECS deploy. (#371)
sonoransun Jan 17, 2024
b82d634
add better merger for contexts (#372)
DGaffney Jan 22, 2024
c5677a2
CV2-4102 fix a couple more tests that could randomly fail (#373)
DGaffney Jan 23, 2024
b1751ae
CV2-4278 flip searching as json bool for get_context_query in audio (…
DGaffney Feb 9, 2024
52ad7b2
CV2-4437 stub out article work to prevent illegal instruction crashes…
DGaffney Apr 1, 2024
3e5d251
Increase gunicorn server settings for more performance. (#379)
sonoransun Mar 14, 2024
06ebc23
Fixing conflicts
caiosba May 9, 2024
6b6fe5e
CV2-4605 push up an initial idea to see how it plays against test sui…
DGaffney May 14, 2024
6c6dc6b
Merge remote-tracking branch 'origin'
DGaffney May 28, 2024
69a9bcc
Merge remote-tracking branch 'origin'
DGaffney Jun 4, 2024
e26a6a1
Merge branch 'develop'
DGaffney Jun 5, 2024
8e338d7
CV2-4790 fix broken error log due to missing file, surface s3 errors …
DGaffney Jul 8, 2024
d471cf0
repin master off develop
DGaffney Jul 11, 2024
e059366
Merge branch 'develop' of github.com:meedan/alegre
DGaffney Aug 12, 2024
bf8780e
Update postgres Docker image to current bookworm slim. (#451)
sonoransun Sep 26, 2024
806d3f3
CV2-5362 Set expiration explicitly on all SharedModel keys (#450)
DGaffney Sep 27, 2024
11711c2
Pin setuptools version (#452)
sonoransun Sep 27, 2024
8126264
Merge branch 'develop'
caiosba Oct 2, 2024
f537582
Merge remote-tracking branch 'origin'
DGaffney Oct 3, 2024
f871a21
Merge remote-tracking branch 'origin'
DGaffney Oct 3, 2024
b8a8b46
Merge remote-tracking branch 'origin'
DGaffney Dec 6, 2024
7cbc755
Merge remote-tracking branch 'origin'
DGaffney Dec 17, 2024
d505ae7
merge from develop v0.187.0 (#483)
skyemeedan Jan 14, 2025
b93281d
Merge remote-tracking branch 'origin' develop for deployment
Feb 4, 2025
3c11545
CV2-6075 initial stab at documentation
DGaffney Feb 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/workflows/gitlab-mirror.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: GitLab Mirror

on:
- push
- delete

jobs:
sync:
runs-on: ubuntu-latest
name: Git Repo Sync
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- uses: wangchucheng/git-repo-sync@v0.1.0
with:
target-url: https://gitlab.com/meedan/alegre.git/
target-username: sonoransun
target-token: ${{ secrets.GITLAB_ACCESS_TOKEN }}
Binary file added doc/img/alegre-audio-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/img/alegre-image-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/img/alegre-text-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/img/alegre-video-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 59 additions & 0 deletions doc/similarity-high-level.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# How Check Processes Items For Similarity

## Images

![Typical Flow, Check Image Matching](doc/img/alegre-image-flow.png?raw=true "Typical Flow, Check Image Matching")
[Edit Link](https://docs.google.com/drawings/d/1jXgbM_06rlpPeip1vxUKpiRYyhumrkFlr-2EC3qBxHg/edit)

At a high level, Check-API receives new `ProjectMedia` items and, as they are created, we perform the following procedures:

1. Store the `ProjectMedia`,
2. Send the `ProjectMedia` through `Bot::Alegre.run`,
3. `Bot::Alegre.run` searches for items via image hashing,
4. Matches are simultaneously checked asynchronously for suggested and confirmed items,
5. Once both queries are completed, we process the item in a callback, and store results in `Bot::Alegre.relate_project_media_callback`,
6. We also store OCR text annotations in the text index for the item, which will match subsequent items (but does *not* match against existing items for the OCR attempt),
7. OCR data is exhausted into OpenSearch for text lookups, and Image Hashes are exhausted into Postgres for image lookups on Alegre,
8. Relationships at the Check API level are persisted after (5).

## Video

![Typical Flow, Check Video Matching](doc/img/alegre-video-flow.png?raw=true "Typical Flow, Check Video Matching")
[Edit Link](https://docs.google.com/drawings/d/1HQTwHmkhzp-J742-QAowfYMNaYoALYTPwOTA-PASHnk/edit)

For video, Check-API receives new `ProjectMedia` items and, as they are created, performs the following procedures:

1. Store the `ProjectMedia`,
2. Send the `ProjectMedia` through `Bot::Alegre.run`,
3. `Bot::Alegre.run` searches for items via video fingerprinting,
4. Matches are simultaneously checked asynchronously for suggested and confirmed items,
5. Once both queries are completed, we process the item in a callback, and store results in `Bot::Alegre.relate_project_media_callback`,
6. We also store transcription text annotations in the text index for the item, which will match subsequent items (but does *not* match against existing items for the transcription attempt),
7. Transcription data is exhausted into OpenSearch for text lookups, and Video Hashes are exhausted into Postgres for video lookups on Alegre, as well as on a disk lookup for .tmk file lookups,
8. Relationships at the Check API level are persisted after (5).

## Text

![Typical Flow, Check Text Matching](doc/img/alegre-text-flow.png?raw=true "Typical Flow, Check Text Matching")
[Edit Link](https://docs.google.com/drawings/d/12WljT8-qsUi8xG584clD_eV1ABOcB6CqkMX0eAxSPrE/edit)

1. Store the `ProjectMedia`,
2. Send the `ProjectMedia` through `Bot::Alegre.run`,
3. `Bot::Alegre.run` searches for items via video fingerprinting,
4. Matches are simultaneously checked asynchronously for suggested and confirmed items *for `original_title` and `original_description`*, *for all vector models applied*,
5. Once both queries are completed *for both fields*, *for all vector models applied*, we process the item in a callback, and store results in `Bot::Alegre.relate_project_media_callback` - that method tracks remaining messages and only processes a match when all messages are no longer in flight,
6. Relationships at the Check API level are persisted after (5).

## Audio

![Typical Flow, Check Audio Matching](doc/img/alegre-audio-flow.png?raw=true "Typical Flow, Check Audio Matching")
[Edit Link](https://docs.google.com/drawings/d/1YwWJMgPxAlonCdq4M5RWaSOzSSucwHkg7EWTggWOhw8/edit)

1. Store the `ProjectMedia`,
2. Send the `ProjectMedia` through `Bot::Alegre.run`,
3. `Bot::Alegre.run` searches for items via video fingerprinting,
4. Matches are simultaneously checked asynchronously for suggested and confirmed items,
5. Once both queries are completed, we process the item in a callback, and store results in `Bot::Alegre.relate_project_media_callback`,
6. We also store transcription text annotations in the text index for the item, which will match subsequent items (but does *not* match against existing items for the transcription attempt),
7. Transcription data is exhausted into OpenSearch for text lookups, and Audio Hashes are exhausted into Postgres for audio lookups on Alegre
8. Relationships at the Check API level are persisted after (5).
Binary file removed doc/similarity.png
Binary file not shown.
Loading