Skip to content

moonbit-community/NyaSearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

64 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฑ NyaSearch: A High-Performance Text Search Library

English | ็ฎ€ไฝ“ไธญๆ–‡

Build Status License codecov

NyaSearch is a fast and efficient text search library designed to perform substring matching in large-scale text data. It supports multiple search algorithms, making it suitable for text editors, log analysis, and data processing.

๐Ÿš€ Key Features

  • ๐Ÿ” Multiple Algorithms โ€“ Supports KMP, Rabin-Karp, and Boyer-Moore.
  • โšก High Performance โ€“ Optimized for fast substring searching.
  • ๐Ÿ›  Easy to Use โ€“ Simple API for quick integration.
  • โœ… Well-Tested โ€“ Comes with comprehensive unit tests.
  • ๐Ÿ”„ Open-Source โ€“ Actively maintained by the Moonbit Community.

๐Ÿ“ฅ Installation

moon add xunyoyo/NyaSearch

๐Ÿš€ Usage Guide for NyaSearch

NyaSearch provides a powerful and flexible string search function that supports multiple algorithms. You can either let it automatically choose the best algorithm or manually specify one. You can also define a search range within the text for more precise matching.


๐Ÿ” Basic Usage

The simplest way to use NyaSearch is to call search, which will automatically select the most efficient algorithm based on the pattern and text.

@NyaSearch.search?("hello world", "world") // Returns: Ok(6), using the best algorithm

If you want to manually choose an algorithm, simply provide the option parameter:

@NyaSearch.search?("hello world", "world", option="kmp") // Returns: Ok(6), using the KMP method
@NyaSearch.search?("hello world", "world", option="boyer_moore") // Returns: Ok(6), using Boyer-Moore
@NyaSearch.search?("hello world", "world", option="rabin_karp") // Returns: Ok(6), using Rabin-Karp

๐ŸŽฏ Searching Within a Specific Range

You can search within a specific part of the text by providing start and end indices.

@NyaSearch.search?("hello world", "o", start=0, end=5) // Returns: Ok(4)
@NyaSearch.search?("hello world", "o", start=5, end=11) // Returns: Ok(7)
  • The start index includes the character at that position.
  • The end index excludes the character at that position.

โšก How "Auto" Mode Works

NyaSearch's Auto Mode intelligently selects the most suitable search algorithm based on the characteristics of the text and pattern to optimize performance.

It evaluates:

  • Pattern length
  • Text length
  • Character uniqueness
  • Repetition ratio

Based on these factors, search(..., option="auto") will choose the best algorithm dynamically.


๐Ÿ” Auto Mode Selection Logic

Condition Selected Algorithm Reason
Pattern โ‰ค 2 characters OR Text โ‰ค 10 characters brute_force Short patterns or small text are best handled by brute force.
Pattern has a high number of unique characters (โ‰ฅ 20) & is longer than 15 boyer_moore Boyer-Moore benefits from large alphabets and long patterns by skipping more characters.
Pattern has a moderate number of unique characters (โ‰ฅ 10) or is at least 8 characters long - If repetition ratio (same character repeating) is high (>30%), use kmp. - KMP handles patterns with repeating prefixes efficiently.
- If repetition is low, use rabin_karp. - Rabin-Karp benefits from hashing unique sequences.
Otherwise (small alphabet or short pattern) kmp KMP is a good general-purpose algorithm.

๐ŸŽฏ Examples

1๏ธโƒฃ Short pattern or small text โ†’ Uses brute force
search("hello", "o")  // Uses brute force
search("ab", "b")     // Uses brute force
search("abcdefgh", "d") // Uses brute force
2๏ธโƒฃ Long pattern with many unique characters โ†’ Uses Boyer-Moore
search("this is a very long text", "UNIQUEPATTERNXYZ") // Uses Boyer-Moore
search("random words here", "QWERTYASDFGHZXCVBNM") // Uses Boyer-Moore
3๏ธโƒฃ Medium pattern with high repetition โ†’ Uses KMP
search("abababababababab", "ababab") // Uses KMP
search("aaaaaaaaaaabcaaaaaaa", "aaaaaa") // Uses KMP (high repetition ratio)
4๏ธโƒฃ Medium pattern with low repetition โ†’ Uses Rabin-Karp
search("abcdefgabcdefgabcdefg", "abcdef") // Uses Rabin-Karp (low repetition)
search("random_data_here", "xyz123") // Uses Rabin-Karp

๐ŸŽฏ Why Auto Mode?

โœ… Eliminates the need to manually select algorithms.
โœ… Ensures optimal performance based on pattern structure.
โœ… Automatically adapts to different search scenarios.

By using option="auto", NyaSearch dynamically chooses the most efficient search method, saving you time and ensuring optimal performance in all cases! ๐Ÿš€


โš ๏ธ Error Handling

If something goes wrong, NyaSearch will raise a meaningful error message.

Error Reason
EmptyPatternError When the search pattern is empty
PatternTooLongError If the pattern is longer than the text
InvalidRangeError If start or end indices are invalid
OptionChooseError If an unsupported algorithm is chosen

๐Ÿ›  Full Example

let text = "The quick brown fox jumps over the lazy dog"
let pattern = "fox"

// Auto mode (default)
let index = @NyaSearch.search?(text, pattern)
print("Found at:", index) // Found at: Ok(16)

// Specify an algorithm
let index_kmp = @NyaSearch.search?(text, pattern, option="kmp")
print("KMP found at:", index_kmp)

// Search within a range
let index_range = @NyaSearch.search?(text, pattern, start=10, end=20)
print("Range search found at:", index_range)

๐ŸŽ‰ Now youโ€™re ready to use NyaSearch for high-performance text searching! ๐Ÿš€

๐Ÿ“œ License

This project is licensed under the Apache-2.0 License. See LICENSE for details.

๐Ÿ“ข Contact & Support

๐Ÿ‘‹ If you like this project, give it a โญ! Happy coding! ๐Ÿš€

About

NyaSearch: A High-Performance Text Search Library

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •