MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
-
Updated
Nov 8, 2025 - Python
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Go metrics for calculating string similarity and other string utility functions
Compare html similarity using structural and style metrics
A package to compute medical segmentation metrics.
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
A Clojure library for querying large data-sets on similarity
Spark functions to run popular phonetic and string matching algorithms
SetSketch: Filling the Gap between MinHash and HyperLogLog
ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity
Calculate various string metrics efficiently in Haskell
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
Minhash and maxhash library in Python, combining flexibility, expressivity, and performance.
Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables
Easy-to-use Java library for similarity checking of strings or numeric-series
This is an implementation of the paper written by Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett
A text similarity computation using minhashing and Jaccard distance on reuters dataset
TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation
insight data engineering fellow project
Add a description, image, and links to the jaccard-similarity topic page so that developers can more easily learn about it.
To associate your repository with the jaccard-similarity topic, visit your repo's landing page and select "manage topics."