GoZ4X

A pure-Go, ultra-fast, high-compression LZ4 library for modern workloads—server, edge, WASM, GPU/accelerator offload.

Features

v0.1 Features

Complete streaming implementation with proper frame format
- Writer for compression
- Reader for decompression
- Proper handling of small data
- Support for various compression levels
Block-level functionality (basic implementation in v0.1)
- Basic block structure
- Frame header handling
- Support for the LZ4 frame format
HC (High Compression) match finding
Well tested with comprehensive unit tests

v0.2 Features

Improved block compression with better match finding
- Dual hash tables (4-byte and 3-byte) for more match opportunities
- Optimized skip strength based on compression level
- Enhanced lazy matching for better compression ratio
- Proper hash table initialization for large blocks
New API for v0.2 compression while maintaining compatibility
Significantly better compression ratios (up to 15-20% improvement in some cases)
Maintained backward compatibility with v0.1

v0.3 Features (New!)

Parallel compression support
- Multi-threaded block compression to utilize all CPU cores
- Parallelized streaming API for high-throughput compression
- Automatic fallback to single-threaded mode when needed
Refined HC (Hash Chain) compression levels
- Enhanced hash functions for high compression levels (5-byte hash)
- Window size optimized per compression level
- Improved lazy matching with early exit strategies
- Optimized hash table sizes for better memory usage
Better compression ratios with faster compression on multi-core systems

v0.4 Features (Completed)

SIMD optimizations framework
- CPU feature detection (SSE4.1, AVX2, AVX512, NEON)
- Architecture-specific optimizations selection at runtime
- Improved performance on supported hardware
Compatibility with all previous versions
Foundation for hardware-accelerated compression
Initial implementation of SIMD-based match finding and copy operations

TODO features

Complete SIMD implementations for match searching and copy loops
- AVX2 and AVX512 optimizations for x86_64 architectures
- Extended NEON support for ARM64 platforms
- Specialized copy and match finding routines for different instruction sets
GPU acceleration for supported hardware
- CUDA/OpenCL backends for NVIDIA/AMD GPUs
- Runtime detection for GPU availability
- Fallback mechanisms for non-GPU environments
Go generics for clean, reusable match-finder and streaming APIs
- Type-safe match finding algorithms
- Generic streaming interfaces for different backends
Pluggable backends (pure-Go, assembly, GPU) selected at runtime
First-class WASM support for browser & edge functions
- Browser-specific optimizations
- TypeScript definitions for JavaScript interoperability
- Examples for browser integration
Comprehensive benchmark suite
- Real-world data type benchmarks
- Comparison with other LZ4 implementations
- Performance visualization tools
Advanced integration options
- io/fs compatibility for modern Go applications
- Middleware for common web frameworks
- Cloud storage adapters (S3, GCS)
Extended testing capabilities
- Fuzzing tests for robustness
- Performance regression testing
- Edge case validation

Status

This is version 0.4, with SIMD optimization framework completely implemented. The implementation includes architecture-specific match finding and copy operations for both x86-64 (SSE) and ARM64 (NEON) architectures, providing the groundwork for hardware acceleration while ensuring compatibility with the LZ4 format.

Usage

Streaming API (v0.1)

package main

import (
    "bytes"
    "fmt"
    "io"
    "strings"
    
    "github.com/harriteja/GoZ4X"
)

func main() {
    // Create sample data
    data := "Hello, GoZ4X!"
    
    // Compress
    var buf bytes.Buffer
    w := goz4x.NewWriter(&buf)
    w.Write([]byte(data))
    w.Close()
    
    fmt.Printf("Original size: %d bytes\n", len(data))
    fmt.Printf("Compressed size: %d bytes\n", buf.Len())
    
    // Decompress
    r := goz4x.NewReader(bytes.NewReader(buf.Bytes()))
    result, _ := io.ReadAll(r)
    
    fmt.Printf("Decompressed: %s\n", string(result))
}

Enhanced Compression with v0.2

package main

import (
    "bytes"
    "fmt"
    "io"
    
    "github.com/harriteja/GoZ4X"
)

func main() {
    // Create sample data
    data := []byte("This is a sample text that will be compressed using GoZ4X v0.2!")
    
    // Compress with v0.2 algorithm
    compressedData, _ := goz4x.CompressBlockV2(data, nil)
    
    fmt.Printf("Original size: %d bytes\n", len(data))
    fmt.Printf("Compressed size: %d bytes\n", len(compressedData))
    
    // Decompress (compatible with standard LZ4 decompression)
    decompressed, _ := goz4x.DecompressBlock(compressedData, nil, len(data))
    
    fmt.Printf("Decompressed: %s\n", string(decompressed))
    
    // Streaming compression with v0.2
    var buf bytes.Buffer
    w := goz4x.NewWriterV2(&buf)
    w.Write(data)
    w.Close()
    
    fmt.Printf("v0.2 streaming compressed size: %d bytes\n", buf.Len())
}

Parallel Compression with v0.3

package main

import (
    "bytes"
    "fmt"
    "io"
    "runtime"
    
    "github.com/harriteja/GoZ4X"
)

func main() {
    // Create or load some large data
    data := make([]byte, 100*1024*1024) // 100MB of data
    // ... fill data with actual content ...
    
    fmt.Printf("CPU cores available: %d\n", runtime.NumCPU())
    
    // Parallel block compression
    compressedData, _ := goz4x.CompressBlockV2Parallel(data, nil)
    
    fmt.Printf("Original size: %d bytes\n", len(data))
    fmt.Printf("Compressed size: %d bytes (%.2f%%)\n", 
        len(compressedData), float64(len(compressedData))*100/float64(len(data)))
    
    // Parallel streaming compression
    var buf bytes.Buffer
    w := goz4x.NewParallelWriterV2(&buf)
    
    // Optionally configure workers and chunk size
    w.SetNumWorkers(runtime.NumCPU()) // Use all available cores
    w.SetChunkSize(4 * 1024 * 1024)   // 4MB chunks
    
    // Compress
    w.Write(data)
    w.Close()
    
    fmt.Printf("Parallel streaming compressed size: %d bytes\n", buf.Len())
    
    // Decompress (standard decompression works with parallel-compressed data)
    r := goz4x.NewReader(bytes.NewReader(buf.Bytes()))
    decompressed, _ := io.ReadAll(r)
    
    fmt.Printf("Decompressed size: %d bytes\n", len(decompressed))
}

SIMD-Optimized Compression with v0.4

package main

import (
    "bytes"
    "fmt"
    "io"
    "runtime"
    
    "github.com/harriteja/GoZ4X"
    "github.com/harriteja/GoZ4X/v04"
    "github.com/harriteja/GoZ4X/v04/simd"
)

func main() {
    // Create or load data
    data := make([]byte, 10*1024*1024) // 10MB of data
    // ... fill data with actual content ...
    
    // Check available CPU features
    features := simd.DetectFeatures()
    fmt.Printf("CPU Features: SSE4.1=%v, AVX2=%v, AVX512=%v, NEON=%v\n",
        features.HasSSE41, features.HasAVX2, features.HasAVX512, features.HasNEON)
    
    // Create custom compression options
    opts := v04.DefaultOptions()
    opts.Level = v04.CompressionLevel(9) // Higher compression level
    
    // Compress with SIMD optimizations where available
    compressedData, _ := v04.CompressBlockWithOptions(data, nil, opts)
    
    fmt.Printf("Original size: %d bytes\n", len(data))
    fmt.Printf("Compressed size: %d bytes (%.2f%%)\n", 
        len(compressedData), float64(len(compressedData))*100/float64(len(data)))
    
    // Parallel SIMD-optimized compression for large data
    opts.NumWorkers = runtime.NumCPU()
    parallelCompressed, _ := v04.CompressBlockParallelWithOptions(data, nil, opts)
    
    fmt.Printf("Parallel compressed size: %d bytes\n", len(parallelCompressed))
    
    // Decompress (standard decompression works with SIMD-compressed data)
    decompressed, _ := goz4x.DecompressBlock(compressedData, nil, len(data))
    
    fmt.Printf("Decompressed size: %d bytes\n", len(decompressed))
}

Future Features (Coming Soon)

GPU Acceleration

package main

import (
    "bytes"
    "fmt"
    
    "github.com/harriteja/GoZ4X"
    "github.com/harriteja/GoZ4X/gpu"
)

func main() {
    // Create or load some large data
    data := make([]byte, 1024*1024*1024) // 1GB of data
    // ... fill data ...
    
    // Check GPU availability
    gpuInfo := gpu.DetectGPUs()
    fmt.Printf("Available GPUs: %d\n", len(gpuInfo))
    for i, info := range gpuInfo {
        fmt.Printf("GPU %d: %s with %dMB memory\n", i, info.Name, info.MemoryMB)
    }
    
    // Create GPU compression options
    opts := gpu.DefaultOptions()
    opts.PreferredDevice = 0 // Use first GPU
    opts.UsePinnedMemory = true // For faster memory transfers
    
    // Compress with GPU acceleration
    compressedData, _ := gpu.CompressBlock(data, nil, opts)
    
    fmt.Printf("Original size: %d MB\n", len(data)/(1024*1024))
    fmt.Printf("Compressed size: %d MB (%.2f%%)\n", 
        len(compressedData)/(1024*1024), 
        float64(len(compressedData))*100/float64(len(data)))
    
    // Stream compression with GPU acceleration
    var buf bytes.Buffer
    w := gpu.NewWriter(&buf, opts)
    w.Write(data)
    w.Close()
}

WebAssembly Support

// Browser JavaScript
import { GoZ4X } from '@harriteja/goz4x-wasm';

async function compressFile() {
    const fileInput = document.getElementById('fileInput');
    const file = fileInput.files[0];
    
    // Initialize the WASM module
    const goz4x = await GoZ4X.init();
    
    // Read the file
    const arrayBuffer = await file.arrayBuffer();
    const inputData = new Uint8Array(arrayBuffer);
    
    console.log(`Original size: ${inputData.length} bytes`);
    
    // Compress the data
    const compressedData = goz4x.compressBlock(inputData);
    
    console.log(`Compressed size: ${compressedData.length} bytes`);
    console.log(`Compression ratio: ${(compressedData.length * 100 / inputData.length).toFixed(2)}%`);
    
    // Create a download link
    const blob = new Blob([compressedData], { type: 'application/octet-stream' });
    const url = URL.createObjectURL(blob);
    
    const downloadLink = document.createElement('a');
    downloadLink.href = url;
    downloadLink.download = `${file.name}.lz4`;
    downloadLink.click();
}

Installation

go get github.com/harriteja/GoZ4X

Roadmap

v0.1: Pure-Go implementation with streaming API (completed)
v0.2: Improved block compression with proper match finding (completed)
v0.3: Parallelism + HC levels refinement (completed)
v0.4: SIMD optimizations where possible (completed)
v0.5: Complete SIMD implementations + GPU acceleration framework
v0.6: WebAssembly optimization + enhanced benchmarking
v0.7: Go generics integration + improved API design
v1.0: Stable release with comprehensive performance optimizations and integrations

Implementation Details

GoZ4X implements the LZ4 compression algorithm according to the official specification:

Core Algorithm

Sliding Window: Uses a 64KB sliding window to find repeated byte sequences.
Token Structure: Each compressed sequence begins with a token byte:
- High 4 bits encode the length of literal data
- Low 4 bits encode the match length (minus 4)
Length Encoding: For lengths ≥ 15, additional bytes encode the extra length.
Match Encoding: Uses a 2-byte little-endian offset to point back into the sliding window.
High Compression: Implements HC mode with configurable depth search for better compression.

v0.2 Optimizations

Dual Hash Tables: Uses both 4-byte and 3-byte hashing to find more potential matches
Adaptive Search Depth: Adjusts search parameters based on compression level
Smart Lazy Matching: Improved decision making for lazy match selection
Skip Strength: Optimized skip strategy to improve compression speed at higher levels
Hash Pre-initialization: Better handling of the initial block for improved compression

v0.3 Optimizations

Parallelism: Multi-threaded compression for better performance on multi-core systems
Worker Pool: Dynamic worker allocation based on available CPU cores
Chunk Processing: Efficient chunking strategy for parallel compression
Enhanced HC Levels: Refined compression levels with optimized window sizes
5-Byte Hashing: Advanced hash function for higher compression levels
Early Exit: Smarter search termination for better performance

TODO Optimizations

SIMD: Vectorized implementations for match finding and copy operations
Hash Tables: Further optimizations of hash lookup strategies
Chain Matching: Advanced chain tracking for even better match finding
Streaming Mode: Additional optimizations for the standard LZ4 frame format
GPU Acceleration: Offloading compression to graphics hardware
- CUDA/OpenCL: Cross-platform GPU acceleration
- Shared Memory: Efficient data transfer between CPU and GPU
- Pipeline Processing: Simultaneous CPU and GPU operations
WebAssembly:
- Browser Optimizations: Specialized code paths for browser environments
- Memory Management: Efficient handling of browser memory constraints
- Worker Threads: Parallel processing in browser contexts
Advanced Integrations:
- io/fs Support: Full compatibility with modern Go file systems
- Middleware Adapters: Ready-to-use components for web frameworks
- Cloud Storage: Direct compression/decompression with cloud providers

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
bench		bench
compress		compress
examples		examples
matcher		matcher
parallel		parallel
v03		v03
v04		v04
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
goz4x.go		goz4x.go
goz4x_test.go		goz4x_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GoZ4X

Features

v0.1 Features

v0.2 Features

v0.3 Features (New!)

v0.4 Features (Completed)

TODO features

Status

Usage

Streaming API (v0.1)

Enhanced Compression with v0.2

Parallel Compression with v0.3

SIMD-Optimized Compression with v0.4

Future Features (Coming Soon)

GPU Acceleration

WebAssembly Support

Installation

Roadmap

Implementation Details

Core Algorithm

v0.2 Optimizations

v0.3 Optimizations

TODO Optimizations

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

harriteja/GoZ4X

Folders and files

Latest commit

History

Repository files navigation

GoZ4X

Features

v0.1 Features

v0.2 Features

v0.3 Features (New!)

v0.4 Features (Completed)

TODO features

Status

Usage

Streaming API (v0.1)

Enhanced Compression with v0.2

Parallel Compression with v0.3

SIMD-Optimized Compression with v0.4

Future Features (Coming Soon)

GPU Acceleration

WebAssembly Support

Installation

Roadmap

Implementation Details

Core Algorithm

v0.2 Optimizations

v0.3 Optimizations

TODO Optimizations

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages