Skip to content

Add linear hybrid search ranker #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 26, 2025
Merged

Conversation

willtai
Copy link
Contributor

@willtai willtai commented Feb 24, 2025

Description

This PR is a follow-up work from a previous PR to rework the Hybrid search ranking step. This PR allows users of the HybridRetriever and HybridCypherRetriever to specify ranker and alpha, which ranks the relevance of the retrieved results over the vector index and fulltext index search using the score:

final_score = alpha * (vector normalized score) + (1 - alpha) * (fulltext normalized score)

In practice, the vector index score is multiplied by alpha and fulltext index score is multiplied by 1 - alpha, then they are compared and ranked.

The default used is still naive which is the naive implementation of comparing normalized scores.

The pros and cons of using this ranker:

Pros

  • Common scale when comparing scores. This reduces the risk of either vector or fulltext index score dominating the score due to its scale.
  • Allow user to determine alpha.

Cons

  • Assumption of linearity. This ranker assumes both index scores can be added linearly, when in reality the underlying score space may be non-linear.
  • The dynamic normalization might introduce inconsistency in ranking across different queries.

Despite these tradeoffs, I think this is valuable for users as it is an upgrade over the naive ranker.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Note

Please provide an estimated complexity of this PR of either Low, Medium or High

Complexity: Medium

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed
  • CHANGELOG.md updated if appropriate

@willtai willtai requested a review from a team as a code owner February 24, 2025 15:31
@willtai
Copy link
Contributor Author

willtai commented Feb 24, 2025

@CodiumAI-Agent /update_changelog

@CodiumAI-Agent
Copy link

Changelog updates: 🔄

2025-02-24

Added

  • Introduced a linear hybrid search ranker for HybridRetriever and HybridCypherRetriever, allowing customizable ranking with an alpha parameter.
  • Added validation and error handling for invalid ranker types in hybrid search.
  • Enhanced unit and E2E tests to cover the new linear ranker functionality.

to commit the new content to the CHANGELOG.md file, please type:
'/update_changelog --pr_update_changelog.push_changelog_changes=true'

@willtai willtai force-pushed the hybrid-search-ranker branch 2 times, most recently from b0914d6 to 533afc6 Compare February 24, 2025 15:54
@willtai willtai force-pushed the hybrid-search-ranker branch 3 times, most recently from 7bf434a to d8205cc Compare February 24, 2025 16:36
@willtai willtai force-pushed the hybrid-search-ranker branch from 98e8faf to 4fe39b5 Compare February 25, 2025 13:45
@willtai willtai requested a review from stellasia February 25, 2025 14:32
@NathalieCharbel
Copy link
Contributor

Thanks for having this feature implemented! Nicely done :D

@CodiumAI-Agent
Copy link

Changelog updates: 🔄

2025-02-25

Added

  • Introduced a linear hybrid search ranker for HybridRetriever and HybridCypherRetriever, allowing customizable ranking with an alpha parameter.

to commit the new content to the CHANGELOG.md file, please type:
'/update_changelog --pr_update_changelog.push_changelog_changes=true'

@willtai willtai requested a review from stellasia February 25, 2025 15:51
@willtai willtai force-pushed the hybrid-search-ranker branch from b47b6b3 to 6a923d5 Compare February 26, 2025 09:21
Copy link
Contributor

@stellasia stellasia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks for addressing this issue!

@willtai willtai merged commit 09440e0 into neo4j:main Feb 26, 2025
7 checks passed
@CodiumAI-Agent
Copy link

Changelog updates: 🔄

2025-02-26

Added

  • Introduced a linear hybrid search ranker for HybridRetriever and HybridCypherRetriever, allowing customizable ranking with an alpha parameter.

to commit the new content to the CHANGELOG.md file, please type:
'/update_changelog --pr_update_changelog.push_changelog_changes=true'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants