Skip to content

Conversation

DGaffney
Copy link
Contributor

@DGaffney DGaffney commented Jan 2, 2025

Description

Removes any final assumptions about video with shared model linkages - one question is wtf was VideoMatcher? Its in the code but was not actually defined? I'm removing that as well...

Reference: CV2-5892

How has this been tested?

Confirmed that when running the tests locally, none of the video tests appeared to fail. That, paired with the tests passing here, paired with specifically checking the tmkpy fingerprint from inside the repl, lead me to conclude with high confidence that this will work fine.

Have you considered secure coding practices when writing this code?

None

@DGaffney DGaffney marked this pull request as ready for review January 3, 2025 18:55
Copy link
Contributor

@skyemeedan skyemeedan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome that the video code removal is relatively easy!

I'm not very familiar with testing for Alegre yet. Since we are relying on the integration tests to validate that code still works, can you link me to the video integration tests so I can double check what we are testing?

With since we are no longer running video models, are there more dependencies we can tear down?

  • Can we also now remove the ffmpeg and avcodec, etc install steps? from the Dockerfiles, etc?
    RUN apt-get update && apt-get install -y ffmpeg cmake swig libavcodec-dev libavformat-dev
  • Can we remove the entire "threat exchange" codebase and associated build stuff? threatexchange/tmk/cpp/bin/tmk-hash-video.cpp

return AudioModel(app.config['AUDIO_MODEL'])

def video_model():
return VideoModel(app.config['VIDEO_MODEL'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not reading VIDEO_MODEL config anymore, can we remove it from config?

VIDEO_MODEL = os.getenv('VIDEO_MODEL', 'video-model')
. Or are there other things that read this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can safely remove I think.

@computermacgyver
Copy link
Contributor

Great work. It looks like the integration tests triggered from Check Web are passing as well?
meedan/check-web#2242

@skyemeedan , we cannot remove swig or cmake, which is needed to compile the tmkpy python package, which we still use to compare videos. I think ffmpeg libavcodec-dev and libavformat-dev may be able to be dropped, but I'm not sure. tmkpy may depend on them even though we don't use those parts of tmkpy anymore.

The threatexchange codebase is used for PDQ image hashing; so, I suggest we leave that until / unless we have already removed image hashing @DGaffney ? It shouldn't be needed for video as the tmkpy package has its own copy of the threatexchange codebase.

Needless to say, we'll want to make sure we test this extensively locally and on QA.

@DGaffney
Copy link
Contributor Author

DGaffney commented Jan 6, 2025

Great work. It looks like the integration tests triggered from Check Web are passing as well? meedan/check-web#2242

@skyemeedan , we cannot remove swig or cmake, which is needed to compile the tmkpy python package, which we still use to compare videos. I think ffmpeg libavcodec-dev and libavformat-dev may be able to be dropped, but I'm not sure. tmkpy may depend on them even though we don't use those parts of tmkpy anymore.

The threatexchange codebase is used for PDQ image hashing; so, I suggest we leave that until / unless we have already removed image hashing @DGaffney ? It shouldn't be needed for video as the tmkpy package has its own copy of the threatexchange codebase.

Needless to say, we'll want to make sure we test this extensively locally and on QA.

Correct that we cannot remove swig or cmake, and I suppose tmkpy may not rely directly on FFMPEG et al, but I'm also not sure. I'll remove it and run a test to see what happens? Same for threatexchange, but that can happen in a separate ticket.

@DGaffney
Copy link
Contributor Author

DGaffney commented Jan 6, 2025

Gutted the dependencies locally and confirmed that tmkpy still runs as expected - pushing these changes up.

>>> import tmkpy
>>> tmkpy.query("00001a9e-29e9-45a0-a691-0687f21b0d58.tmk",["00000a72-e6b5-4fc8-b4ee-63e38ba502f1.tmk"],1)
(0.13391751050949097,)

@DGaffney DGaffney requested a review from skyemeedan January 6, 2025 19:28
Copy link
Contributor

@skyemeedan skyemeedan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome that we are able to remove the libraries!

  • looks like the threatexchange removal will be handled on the reopened image ticket? #478
  • were you able to rebuild and run tests locally?

@DGaffney
Copy link
Contributor Author

DGaffney commented Jan 7, 2025

Awesome that we are able to remove the libraries!

Confirmed that when running the tests locally, none of the video tests appeared to fail. That, paired with the tests passing here, paired with specifically checking the tmkpy fingerprint from inside the repl, lead me to conclude with high confidence that this will work fine.

@DGaffney DGaffney requested a review from skyemeedan January 7, 2025 14:42
@DGaffney DGaffney merged commit a0c7fa5 into develop Jan 7, 2025
3 checks passed
@DGaffney DGaffney deleted the cv2-5892-gut-video branch January 7, 2025 18:20
skyemeedan added a commit that referenced this pull request Jan 14, 2025
* CV2-5890 remove any ties to shared model for audio model (#476)

* remove audio model config

* CV2-5892 gut video (#477)

* remove more dependencies

* remove another mention of config var

* CV2-5891 remove threatexchange (#478)

* remove ruby

* CV2-5778-change-language-detection-agreement (#474)

* Update langid.py to reflect `Both CLD3 and FastText must agree and BOTH of them is more than "Threshold" confident`

lowering threshold to 0.7

* update langid.py to check if `fasttext_result['result']['language']` and `cld_result['result']['language']`  are not None before calculating the minimum confidence

* update langid.py write the if condition in one line

* update langid.py to check if fasttext_result['result']['confidence'] and cld_result['result']['confidence'] not None


---------

Co-authored-by: Devin Gaffney <itsme@devingaffney.com>
Co-authored-by: ahmednasserswe <ahmed.nasser.swe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants