-
Notifications
You must be signed in to change notification settings - Fork 7
[CV2-6075] documentation for Check-Alegre-Presto media indexing and similarity workflows #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the diagrams and descriptions are very helpful, but seem more like high level diagrams requested for https://meedan.atlassian.net/browse/CV2-5885. I believe CV2-6075 was asking for swimlane style call/state diagrams to follow the flow of control through thesystems for each media type. I also found it super helpful where you include actual function names because I could jump to that part of the code to understand what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This diagram is a helpful overview! I think the parts in 5.x are different now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's super out of date, and I have no idea where the source even is. Is this really worth reviving as a doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diagrams and documentation is very helpful! It doesn't align well with what was requested on the ticket in terms of format and and location. I think it would be great to merge in these changes and address in a followup PR
commit 008f4f2 Author: ahmednasserswe <ahmed.nasser.swe@gmail.com> Date: Fri Feb 28 06:58:11 2025 +0100 CV2-5077-Add-test-coverage-for-OpenAI-module-in-Alegre (#491) * create test_openai.py and add `test_openai_get_document_body` * Update test_openai.py with `test_retrieve_openai_embeddings_handles_api_error` and`test_retrieve_openai_embeddings_calls_openai_api` * Update test_openai.py to mock redis cache * Update test_openai.py to include tests that call the api * mocking cache in `test_openai_get_document_body` and `test_retrieve_openai_embeddings_calls_openai_api` * asserting openai key is set during CI * adding a unit test to find if openai_api_key is in env * added test OPENAI key for CI to encoded .env file https://meedan.atlassian.net/wiki/spaces/ENG/pages/1942945794/How+to+encrypt+and+decrypt+.enc+files+in+Alegre --------- Co-authored-by: Skye Bender-deMoll <skye@meedan.org> commit d63cc85 Author: Skye Bender-deMoll <skye@meedan.org> Date: Thu Feb 27 15:48:30 2025 -0800 update to show checkweb direct dependency on checkApi commit 60be6ac Author: Skye Bender-deMoll <skye@meedan.org> Date: Wed Feb 26 14:39:37 2025 -0800 added links to new docs in README, moved png files from toplevel into /doc commit 3636375 Author: Skye Bender-deMoll <skye@meedan.org> Date: Wed Feb 26 14:28:44 2025 -0800 added explanation and API dependencies commit cb7e5b1 Author: Skye Bender-deMoll <skye@meedan.org> Date: Wed Feb 26 14:22:51 2025 -0800 renamed and corrected image links commit 9412b7a Author: Skye Bender-deMoll <skye@meedan.org> Date: Wed Feb 26 12:46:38 2025 -0800 [CV2-5885] first sketch of infra based on notes from Devin commit 7b135a2 Author: Skye Bender-deMoll <skye@meedan.org> Date: Thu Feb 27 16:16:11 2025 -0800 adding --threads back in commit 4ea58a1 Author: Skye Bender-deMoll <skye@meedan.org> Date: Thu Feb 27 15:09:00 2025 -0800 [CV2-6066] doubling worker processes from 8 to 16, and removing --threads since docs show it is only for gevent commit 94dd470 Author: Skye Bender-deMoll <skye@meedan.org> Date: Thu Feb 27 11:12:59 2025 -0800 doubling threads again from 8 to 16 commit 227fa79 Author: Skye Bender-deMoll <skye@meedan.org> Date: Thu Feb 27 11:11:42 2025 -0800 disabling some tests to simplify testing protocol commit 1afea6a Author: Skye Bender-deMoll <skye@meedan.org> Date: Thu Feb 27 11:03:58 2025 -0800 [CV2-6066] added a load test for async endpoint commit cf3e2c2 Author: Martin Peck <51542678+sonoransun@users.noreply.github.com> Date: Wed Feb 26 08:25:46 2025 -0700 Fix ecs deploy dependencies with updated version pins. (#493) * Try to resolve dependency errors in ecs deploy by updating botocore, boto3, and awscli version pins. commit c0930a9 Author: Devin Gaffney <itsme@devingaffney.com> Date: Tue Feb 25 20:10:14 2025 -0500 CV2-6022 update tests to use non-deprecated URLs (#487) * CV2-6022 update tests to use non-deprecated (presto backed) URLs * update queries to no longer use old endpoint * test for min_es_score fix * add custom error * add sleeps, it ended up just being eventual consistency * switch to greaters, deprecated test file having weird interaction effects that we probably don't need to solve for right now * Update app/test/test_similarity.py commit 6645936 Author: Skye Bender-deMoll <122867176+skyemeedan@users.noreply.github.com> Date: Tue Feb 25 11:00:33 2025 -0800 [CV2-6066] load tests 3 (#492) * [CV2-6066] change warning message to not include actual key value so sentry errors will group * double number of alegre web server worker process-THREADS to 8 in QA and Live to handle more connections --------- Co-authored-by: Skye Bender-deMoll <skye@meedan.org> commit c4828f5 Author: Devin Gaffney <itsme@devingaffney.com> Date: Mon Feb 24 14:32:45 2025 -0800 [CV2-6075] documentation for Check-Alegre-Presto media indexing and similarity workflows (#489) * CV2-6075 initial stab at documentation for Check-Alegre-Presto media indexing and similarity workflows commit 3494d9e Author: Skye Bender-deMoll <122867176+skyemeedan@users.noreply.github.com> Date: Fri Feb 14 09:06:34 2025 -0800 [CV2-6066] load tests for text similarity endpoint (#490) * adding faker pip requirement for random text generation for benchmarks (testing only) * [CV2-6066] draft load testing script for text similarity endpoints * updates to benchmark (disabling parallel inserts) and some code comments * re-enabled parallel tests, reducing so all tests only run 100 items * added check to not run in live --------- Co-authored-by: Skye Bender-deMoll <skye@meedan.org> commit 41c9f8e Author: Skye Bender-deMoll <122867176+skyemeedan@users.noreply.github.com> Date: Fri Feb 7 08:45:44 2025 -0800 [CV2-6020] test updates and handling for missing min_es_score (#485) * [CV2-6020] test updates and handling for missing min_es_score * added more explicit error messaging and confirmed error status code, corrected failing test --------- Co-authored-by: Skye Bender-deMoll <skye@meedan.org>
Description
Adds some documentation for high level steps around how each modality is processed across services
Reference: CV2-6075
How has this been tested?
NA
Have you considered secure coding practices when writing this code?
NA