Skip to content

Conversation

@GitAd7
Copy link
Contributor

@GitAd7 GitAd7 commented Sep 14, 2025

Summary:
This PR fixes a critical bug in Harmony’s matching pipeline where empty or whitespace-only inputs (e.g., "" vs "") were incorrectly producing a similarity score of 1.0.

Changes:

  1. Updated process_questions to explicitly set TextVector.vector = None when input text is empty or whitespace.
  2. Added unit-level tests to verify that empty/whitespace strings result in None vectors.
  3. Extended higher-level tests to ensure downstream similarity results correctly return None instead of misleading scores.
  4. Introduced mocking of add_text_to_vec across tests to avoid loading the pretrained model, making the test suite CI-friendly and faster to run.

Impact:

  1. Prevents false-positive matches in similarity scoring.
  2. Improves correctness when handling edge cases (empty/whitespace input).
  3. Significantly reduces test execution time by removing model downloads.

@jaydugad
Copy link
Collaborator

All tests are passing locally on Python 3.11, including the ones that were failing before. Going to merge this now.
Thanks @GitAd7.

@jaydugad jaydugad merged commit 529ab55 into harmonydata:main Sep 18, 2025
0 of 2 checks passed
@GitAd7
Copy link
Contributor Author

GitAd7 commented Sep 26, 2025

Hi @jaydugad, I'm very pleased to know that all the test cases are passing locally, this being my first open-source contribution means a lot to me. I would also like to contribute to more issues and solve them, also would love to connect with you on linkedin. Thanks

@jaydugad
Copy link
Collaborator

That’s awesome to hear! On my GitHub account, I have added my LinkedIn profile - let's connect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants