GitHub - Ferki-git-creator/fsp: Easy data compression algorithm.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
test		test
CHANGELOG.txt		CHANGELOG.txt
LICENSE		LICENSE
README.txt		README.txt
index.html		index.html

Repository files navigation

FSP — Find Similar Patterns (Next Level Update 2.1)

ALGORITHM DESCRIPTION

FSP is a universal data compression algorithm for any files or byte streams. Its core idea is to find similar patterns in data and store only the differences or references, avoiding duplication. Version 2.1 adds precise byte-level storage, automatic pattern length selection, optimized reference handling, and extended max pattern lengths for long repeats.

AUTOMATIC MINIMUM PATTERN SELECTION

- The algorithm scans the text/data and automatically selects a minimum pattern length between 3–5 characters.
- The chosen length maximizes repeated patterns while avoiding overhead from too-short sequences.

MINIMIZED REF OVERHEAD

- Patterns that repeat only once are stored as LITERAL instead of creating a REF.
- This avoids unnecessary reference overhead and keeps compressed data compact.

MAXIMUM PATTERN LENGTH SELECTION

- For longer repeated blocks, the algorithm searches for patterns up to 5–6 characters long.
- Longer patterns improve compression for large repeated sequences without increasing complexity.

BYTE-EXACT STORAGE (NEW IN 2.1)

- REF and LITERAL are stored as raw bytes for exact size tracking.
- LITERAL stores 1 byte per character.
- REF stores 1 byte for position + 1 byte for length (on small test data) or more bytes depending on data size.
- Compressed data can be saved directly to a file (e.g., output.txt) and decompressed without additional metadata.

SIMPLE EXPLANATION

Think of a LEGO box analogy:

1. Pick one construction as the base.
2. Store similar constructions as base + differences only.
3. Unique constructions remain unchanged.
4. You save space by storing only new details instead of the entire construction.

EXAMPLE

Text:
Base: "Hello friend!"
Similar: "Hello fiend!"
Store:
- Base as LITERAL
- Differences: position 7 -> "i"
During decompression: apply differences to base → get original similar line.

BYTE DATA EXAMPLE

- Works for any byte file or stream.
- Split data into blocks or patterns.
- Store one base block/pattern.
- Store repeated blocks as references or differences.
- Unique blocks store fully.

NEW USE CASES

- Logs and journals with almost repeating lines
- File versions and incremental backups
- Sets of images with minor differences
- Video frames with slight changes
- Any data where repeated patterns exist
- Text or byte streams where 3–5 character pattern matching is sufficient

IMPLEMENTATION STEPS

1. Scan entire data to find repeated patterns of length 3–5 automatically.
2. Create a list of base patterns.
3. For each new pattern:
a. Check for a similar pattern in base patterns.
b. If similar, store as REF (offset + length).
c. If unique, store as LITERAL.
4. Record results:
- Base patterns
- References/differences for repeated patterns
- Unique LITERAL data
5. Save compressed data to a file (e.g., output.txt) as raw bytes.
6. Decompression:
a. Read compressed bytes from file
b. Apply references/differences to reconstruct repeated patterns
c. Add unique LITERAL data unchanged

PATTERN LENGTH GUIDELINES

- Automatic min_len = 3–5 characters
- Max pattern length = 5–6 characters for longer repeats
- Optimal min/max values adapt to data type:
* Short repetitive text → min_len=3, max_len=5
* Longer logs/technical text → min_len=4, max_len=6

FEATURES

- Automatic pattern length selection for maximum compression
- Minimized REF overhead to prevent size inflation
- Handles long repeated patterns efficiently
- Simple LITERAL + REF format stored as raw bytes
- Works on any type of data stream
- Cross-language implementable (Python, C, C++, Java, Rust, Go, etc.)
- Compressed data saved byte-for-byte for exact decompression
- Testable with speed measurement for compression and decompression

SUMMARY

Next-level FSP 2.1 is simple, universal, and powerful. It automatically adapts pattern lengths, minimizes unnecessary references, and effectively compresses data with repeated sequences. Ideal for text, byte streams, backups, logs, images, or any data with repeated patterns, providing significant file size reduction without data loss. Compressed files can be stored and transmitted as raw bytes with exact reconstruction.

LICENSE

Licensed under LGPL 3.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

License

Ferki-git-creator/fsp

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages