Skip to content

Conversation

DGaffney
Copy link
Contributor

Description

Adds some documentation for high level steps around how each modality is processed across services

Reference: CV2-6075

How has this been tested?

NA

Have you considered secure coding practices when writing this code?

NA

@skyemeedan skyemeedan changed the title CV2-6075 initial stab at documentation [CV2-6075] documentation for Check-Alegre-Presto media indexing and similarity workflows Feb 12, 2025
Copy link
Contributor

@skyemeedan skyemeedan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the diagrams and descriptions are very helpful, but seem more like high level diagrams requested for https://meedan.atlassian.net/browse/CV2-5885. I believe CV2-6075 was asking for swimlane style call/state diagrams to follow the flow of control through thesystems for each media type. I also found it super helpful where you include actual function names because I could jump to that part of the code to understand what is going on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diagram is a helpful overview! I think the parts in 5.x are different now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's super out of date, and I have no idea where the source even is. Is this really worth reviving as a doc?

@DGaffney DGaffney requested a review from skyemeedan February 19, 2025 19:23
@DGaffney DGaffney marked this pull request as ready for review February 19, 2025 20:36
Copy link
Contributor

@skyemeedan skyemeedan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diagrams and documentation is very helpful! It doesn't align well with what was requested on the ticket in terms of format and and location. I think it would be great to merge in these changes and address in a followup PR

@skyemeedan skyemeedan merged commit c4828f5 into develop Feb 24, 2025
4 checks passed
@skyemeedan skyemeedan deleted the cv2-6075-docs branch February 24, 2025 22:32
skyemeedan pushed a commit that referenced this pull request Mar 3, 2025
commit 008f4f2
Author: ahmednasserswe <ahmed.nasser.swe@gmail.com>
Date:   Fri Feb 28 06:58:11 2025 +0100

    CV2-5077-Add-test-coverage-for-OpenAI-module-in-Alegre (#491)

    * create test_openai.py and add `test_openai_get_document_body`
    * Update test_openai.py with `test_retrieve_openai_embeddings_handles_api_error` and`test_retrieve_openai_embeddings_calls_openai_api`
    * Update test_openai.py to mock redis cache
    * Update test_openai.py to include tests that call the api
    * mocking cache in `test_openai_get_document_body` and `test_retrieve_openai_embeddings_calls_openai_api`
    * asserting openai key is set during CI
    * adding a unit test to find if openai_api_key is in env
    * added test OPENAI key for CI to encoded .env file https://meedan.atlassian.net/wiki/spaces/ENG/pages/1942945794/How+to+encrypt+and+decrypt+.enc+files+in+Alegre

    ---------

    Co-authored-by: Skye Bender-deMoll <skye@meedan.org>

commit d63cc85
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Thu Feb 27 15:48:30 2025 -0800

    update to show checkweb direct dependency on checkApi

commit 60be6ac
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Wed Feb 26 14:39:37 2025 -0800

    added links to new docs in README, moved png files from toplevel into /doc

commit 3636375
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Wed Feb 26 14:28:44 2025 -0800

    added explanation and API dependencies

commit cb7e5b1
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Wed Feb 26 14:22:51 2025 -0800

    renamed and corrected image links

commit 9412b7a
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Wed Feb 26 12:46:38 2025 -0800

    [CV2-5885] first sketch of infra based on notes from Devin

commit 7b135a2
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Thu Feb 27 16:16:11 2025 -0800

    adding --threads back in

commit 4ea58a1
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Thu Feb 27 15:09:00 2025 -0800

    [CV2-6066] doubling worker processes from 8 to 16, and removing --threads since docs show it is only for gevent

commit 94dd470
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Thu Feb 27 11:12:59 2025 -0800

    doubling threads again from 8 to 16

commit 227fa79
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Thu Feb 27 11:11:42 2025 -0800

    disabling some tests to simplify testing protocol

commit 1afea6a
Author: Skye Bender-deMoll <skye@meedan.org>
Date:   Thu Feb 27 11:03:58 2025 -0800

    [CV2-6066] added a load test for async endpoint

commit cf3e2c2
Author: Martin Peck <51542678+sonoransun@users.noreply.github.com>
Date:   Wed Feb 26 08:25:46 2025 -0700

    Fix ecs deploy dependencies with updated version pins. (#493)

    * Try to resolve dependency errors in ecs deploy by updating botocore, boto3, and awscli version pins.

commit c0930a9
Author: Devin Gaffney <itsme@devingaffney.com>
Date:   Tue Feb 25 20:10:14 2025 -0500

    CV2-6022 update tests to use non-deprecated URLs (#487)

    * CV2-6022 update tests to use non-deprecated (presto backed) URLs
    * update queries to no longer use old endpoint
    * test for min_es_score fix
    * add custom error
    * add sleeps, it ended up just being eventual consistency
    * switch to greaters, deprecated test file having weird interaction effects that we probably don't need to solve for right now
    * Update app/test/test_similarity.py

commit 6645936
Author: Skye Bender-deMoll <122867176+skyemeedan@users.noreply.github.com>
Date:   Tue Feb 25 11:00:33 2025 -0800

    [CV2-6066] load tests 3 (#492)

    * [CV2-6066] change warning message to not include actual key value so sentry errors will group
    * double number of alegre web server worker process-THREADS to 8 in QA and Live to handle more connections
    ---------
    Co-authored-by: Skye Bender-deMoll <skye@meedan.org>

commit c4828f5
Author: Devin Gaffney <itsme@devingaffney.com>
Date:   Mon Feb 24 14:32:45 2025 -0800

    [CV2-6075] documentation for Check-Alegre-Presto media indexing and similarity workflows (#489)

    * CV2-6075 initial stab at documentation for Check-Alegre-Presto media indexing and similarity workflows

commit 3494d9e
Author: Skye Bender-deMoll <122867176+skyemeedan@users.noreply.github.com>
Date:   Fri Feb 14 09:06:34 2025 -0800

    [CV2-6066] load tests for text similarity endpoint (#490)

    * adding faker pip requirement for random text generation for benchmarks (testing only)
    * [CV2-6066] draft load testing script for text similarity endpoints
    * updates to benchmark (disabling parallel inserts) and some code comments
    * re-enabled parallel tests, reducing so all tests only run 100 items
    * added check to not run in live
    ---------
    Co-authored-by: Skye Bender-deMoll <skye@meedan.org>

commit 41c9f8e
Author: Skye Bender-deMoll <122867176+skyemeedan@users.noreply.github.com>
Date:   Fri Feb 7 08:45:44 2025 -0800

    [CV2-6020] test updates and handling for missing min_es_score (#485)

    * [CV2-6020] test updates and handling for missing min_es_score
    * added more explicit error messaging and confirmed error status code, corrected failing test
    ---------
    Co-authored-by: Skye Bender-deMoll <skye@meedan.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants