Skip to content

szhounyc/stats_lib

Repository files navigation

Statistical Calculations Library

A high-performance statistical calculations library implemented in Rust with bindings for Python, Java, and Go.

Author: Steven Zhou @ March, 2025

Features

Basic Statistics

  • Moving average calculation
  • Maximum value
  • Minimum value
  • Standard deviation
  • Percentile calculation

Time Series Analysis

  • Outlier detection
  • Forecasting
  • Arithmetic operations (abs, log2, log10)
  • Rate calculations
  • Exponential smoothing (EWMA)
  • Time shifting and alignment
  • Interpolation
  • Exclusion operations
  • Ranking operations
  • Regression analysis
  • Rollup operations

Building and Testing

Prerequisites

  • Rust (latest stable version)
  • Cargo

Running the Rust Library Tests

  1. Navigate to the library directory:
cd stats_lib
  1. Build the library:
cargo build
  1. Run the tests:
cargo test -- --nocapture

Expected test output should show all tests passing:

running 14 tests
test tests::test_max ... ok
test tests::test_moving_average ... ok
test tests::test_min ... ok
test tests::test_percentile ... ok
test timeseries::tests::test_abs ... ok
test timeseries::tests::test_align_timestamp ... ok
test tests::test_stddev ... ok
test timeseries::tests::test_detect_outliers ... ok
test timeseries::tests::test_ewma ... ok
test timeseries::tests::test_invalid_inputs ... ok
test timeseries::tests::test_forecast ... ok
test timeseries::tests::test_rate ... ok
test timeseries::tests::test_log2 ... ok
test timeseries::tests::test_timeshift ... ok

test result: ok. 14 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
  1. Run specific tests:
# Run tests with a specific name pattern
cargo test test_moving_average

# Run tests in a specific module
cargo test timeseries::tests::

# Run tests with output
cargo test -- --nocapture
  1. Build for release:
cargo build --release

The release build will create optimized library files:

  • macOS: target/release/libstats_lib.dylib
  • Linux: target/release/libstats_lib.so
  • Windows: target/release/stats_lib.dll

Running the Rust Example

  1. Navigate to the Rust example directory:
cd examples/rust_example
  1. Run the example:
cargo run

Expected output:

Basic Statistics Example:
------------------------
Dataset: [1.0, 2.0, 3.0, 4.0, 5.0]
Maximum: 5.00
Minimum: 1.00
Standard Deviation: 1.41
Moving Average (window size = 3): [2.0, 3.0, 4.0]

Time Series Analysis Example:
---------------------------
Outliers detected: [TimeSeriesPoint { timestamp: 3, value: 10.0 }]
Absolute values: [TimeSeriesPoint { timestamp: 0, value: 1.0 }, TimeSeriesPoint { timestamp: 1, value: 2.0 }, TimeSeriesPoint { timestamp: 2, value: 3.0 }, TimeSeriesPoint { timestamp: 3, value: 10.0 }, TimeSeriesPoint { timestamp: 4, value: 4.0 }, TimeSeriesPoint { timestamp: 5, value: 5.0 }]
Rates of change: [TimeSeriesPoint { timestamp: 1, value: 1.0 }, TimeSeriesPoint { timestamp: 2, value: 1.0 }, TimeSeriesPoint { timestamp: 3, value: 7.0 }, TimeSeriesPoint { timestamp: 4, value: -6.0 }, TimeSeriesPoint { timestamp: 5, value: 1.0 }]
EWMA smoothed values: [TimeSeriesPoint { timestamp: 0, value: 1.0 }, TimeSeriesPoint { timestamp: 1, value: 1.2999999999999998 }, TimeSeriesPoint { timestamp: 2, value: 1.8099999999999996 }, TimeSeriesPoint { timestamp: 3, value: 4.266999999999999 }, TimeSeriesPoint { timestamp: 4, value: 4.1869 }, TimeSeriesPoint { timestamp: 5, value: 4.430829999999999 }]
Error generating forecast: InvalidInput("Need at least two complete seasons of data")
  1. Run the Rust example tests:
cd examples/rust_example
cargo test

Expected test output:

running 4 tests
test tests::test_moving_average ... ok
test tests::test_basic_stats ... ok
test tests::test_invalid_window ... ok
test tests::test_time_series ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Running Python Examples and Tests

  1. Navigate to the Python examples directory:
cd examples/python
  1. Ensure you have the required Python packages:
pip install numpy pytest
  1. Build the Rust library in release mode:
# From the stats_lib directory
CARGO_BUILD_TARGET=aarch64-apple-darwin cargo build --release
  1. Run the Python tests:
# Run all tests with detailed output
python3 -m pytest test_stats_lib.py -v

# Run a specific test
python3 -m pytest test_stats_lib.py -v -k "test_moving_average"

# Run the simple test script
python3 test_stats.py
  1. Run the example script:
python3 example.py

Expected test output should show all tests passing:

test_stats_lib.py::TestBasicStats::test_max PASSED
test_stats_lib.py::TestBasicStats::test_min PASSED
test_stats_lib.py::TestBasicStats::test_moving_average PASSED
test_stats_lib.py::TestBasicStats::test_stddev PASSED
test_stats_lib.py::TestTimeSeries::test_creation PASSED
test_stats_lib.py::TestTimeSeries::test_properties PASSED

Troubleshooting Python Tests

  1. If you see an error about not finding the library:

    • Ensure you've built the Rust library in release mode
    • Check that the library file exists in ../../target/release/
    • Verify the library architecture matches your system
  2. Common Python-specific issues:

    • Missing NumPy: Install with pip install numpy
    • Missing pytest: Install with pip install pytest
    • Library load errors: Check the library path in stats_lib.py

Running Go Examples and Tests

  1. First, build the Rust library in release mode:
# From the stats_lib directory
cargo build --release
  1. Make sure the library is built correctly:
# Check if the library exists
ls -l target/release/libstats_lib.dylib
  1. Navigate to the Go examples directory:
cd examples/go
  1. Set the library path environment variables:
# For macOS
export DYLD_LIBRARY_PATH="$(pwd)/../../target/release:$DYLD_LIBRARY_PATH"
export LIBRARY_PATH="$(pwd)/../../target/release:$LIBRARY_PATH"
  1. Run the example program:
# Run with CGO enabled for Apple Silicon
GOARCH=arm64 CGO_ENABLED=1 go run cmd/example/main.go

If you encounter errors about undefined symbols or missing functions, you may need to check that:

  • The Rust library is built correctly for your architecture
  • The library path is set correctly
  • The Go code is using the correct function signatures

For troubleshooting:

# Check the library architecture
file ../../target/release/libstats_lib.dylib

# Check the library symbols
nm -g ../../target/release/libstats_lib.dylib
  1. Run the Go tests:
# Navigate to the stats package directory
cd pkg/stats

# Run tests with verbose output
GOARCH=arm64 CGO_ENABLED=1 go test -v

Expected test output:

=== RUN   TestNewTimeSeries
=== RUN   TestNewTimeSeries/valid_series
=== RUN   TestNewTimeSeries/mismatched_lengths
--- PASS: TestNewTimeSeries
=== RUN   TestMovingAverage
=== RUN   TestMovingAverage/valid_window
=== RUN   TestMovingAverage/window_too_large
=== RUN   TestMovingAverage/window_zero
--- PASS: TestMovingAverage
=== RUN   TestMax
=== RUN   TestMax/valid_data
=== RUN   TestMax/empty_data
--- PASS: TestMax
=== RUN   TestMin
=== RUN   TestMin/valid_data
=== RUN   TestMin/empty_data
--- PASS: TestMin
=== RUN   TestStdDev
=== RUN   TestStdDev/valid_data
=== RUN   TestStdDev/single_point
=== RUN   TestStdDev/empty_data
--- PASS: TestStdDev
PASS

Troubleshooting Go Tests

  1. Common Go-specific issues:

    • CGO not enabled: Make sure to set CGO_ENABLED=1
    • Architecture mismatch: Set GOARCH=arm64 for Apple Silicon
    • Library not found: Ensure Rust library is built in release mode
    • Linker warnings about LC_DYSYMTAB: These can be safely ignored
  2. If tests fail:

    • Check that the Rust library is built correctly
    • Verify Go environment variables are set properly
    • Ensure you're in the correct directory for running tests

Development Notes

Memory Management

  • The Rust library handles memory allocation and deallocation
  • Python, Java, and Go bindings properly manage memory through their respective FFI mechanisms
  • Memory leaks are prevented by proper cleanup in each language binding

Thread Safety

  • The core statistical functions are thread-safe
  • Each language binding handles concurrent access appropriately
  • No global mutable state is maintained

Performance Considerations

  • FFI calls have overhead; batch operations when possible
  • Large datasets should be processed in chunks
  • Consider using the native Rust interface for performance-critical applications

License

[Your License Here]

Contributing

[Contributing Guidelines]

Architecture-Specific Notes (Apple Silicon)

If you're using an Apple Silicon (M1/M2) Mac, you need to ensure:

  1. Build the library for ARM64:
# Check your architecture
uname -m  # Should show 'arm64'

# Clean and rebuild
cd ../..
cargo clean
CARGO_BUILD_TARGET=aarch64-apple-darwin cargo build --release

# Verify the library architecture
file target/release/libstats_lib.dylib  # Should show 'arm64'
  1. Set the correct library path and architecture:
# Set both library paths
export DYLD_LIBRARY_PATH="../../target/release:$DYLD_LIBRARY_PATH"
export DYLD_FALLBACK_LIBRARY_PATH="../../target/release:$DYLD_FALLBACK_LIBRARY_PATH"

# Run with architecture-specific options
java -cp .:jna-5.12.1.jar \
    -Djna.library.path="$(pwd)/../../target/release" \
    -Djna.platform.library.path="$(pwd)/../../target/release" \
    -Djna.debug_load=true \
    TestStats
  1. Common Apple Silicon Issues:
    • If you see darwin-x86-64 in error messages, the JNA is trying to load x86 library
    • If you see aarch64 or arm64 in messages, it's correctly detecting Apple Silicon
    • Use otool -L libstats_lib.dylib to verify library dependencies

Testing Guide

Running Tests

# Run all tests
cargo test

# Run tests with output
cargo test -- --nocapture

# Run specific test categories
cargo test test_basic     # Run basic statistical tests
cargo test test_timeseries  # Run time series tests
cargo test test_invalid   # Run invalid input tests

# Run tests with coverage (requires cargo-tarpaulin)
cargo install cargo-tarpaulin
cargo tarpaulin

# Run benchmarks (requires nightly Rust)
cargo bench

Test Categories

  1. Basic Statistical Tests

    • Moving average calculation
    • Maximum/minimum value detection
    • Standard deviation computation
    • Percentile calculation
  2. Time Series Tests

    • Outlier detection
    • Forecasting
    • Arithmetic operations
    • Rate calculations
    • EWMA smoothing
    • Time shifting
    • Timestamp alignment
  3. Invalid Input Tests

    • Empty series handling
    • Invalid parameter validation
    • Error message verification

Expected Test Output

When running cargo test -- --nocapture, you should see output similar to:

running 14 tests
test tests::test_moving_average ... ok
test tests::test_max ... ok
test tests::test_min ... ok
test tests::test_stddev ... ok
test tests::test_percentile ... ok
test timeseries::tests::test_detect_outliers ... ok
test timeseries::tests::test_forecast ... ok
test timeseries::tests::test_abs ... ok
test timeseries::tests::test_log2 ... ok
test timeseries::tests::test_rate ... ok
test timeseries::tests::test_ewma ... ok
test timeseries::tests::test_timeshift ... ok
test timeseries::tests::test_align_timestamp ... ok
test timeseries::tests::test_invalid_inputs ... ok

test result: ok. 14 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Test Coverage

To check test coverage:

cargo tarpaulin --out Html
# Opens coverage report in your browser

Expected coverage metrics:

  • Basic Statistics: >95% coverage
  • Time Series Operations: >90% coverage
  • Error Handling: >85% coverage

Benchmarking

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench --bench timeseries_benchmarks

Python Integration

Prerequisites

  1. Python Requirements:

    • Python 3.6 or later
    • NumPy library
    python3 --version  # Should be 3.6 or higher
    pip3 install numpy
  2. Build the Rust Library:

    # From the stats_lib directory
    cargo build --release

Directory Structure

stats_lib/
├── src/                    # Rust source code
├── target/
│   └── release/           # Contains compiled library
│       └── libstats_lib.dylib  # macOS
│       # or libstats_lib.so    # Linux
│       # or stats_lib.dll      # Windows
└── examples/
    └── python/
        ├── stats_lib.py       # Python wrapper
        ├── test_stats_lib.py  # Test suite
        └── example.py         # Usage example

Running the Python Example

  1. Set up the library path:

    # For macOS:
    export DYLD_LIBRARY_PATH="$(pwd)/target/release:$DYLD_LIBRARY_PATH"
    
    # For Linux:
    export LD_LIBRARY_PATH="$(pwd)/target/release:$LD_LIBRARY_PATH"
    
    # For Windows, add the directory to PATH
    # set PATH=%PATH%;%CD%\target\release
  2. Run the example program:

    cd examples/python
    python3 example.py

    Expected output:

    Statistical Calculations Library Example
    ======================================
    
    Basic Statistics:
    ----------------
    Moving average (window=3): [2. 3. 4. 5. 6. 7. 8. 9.]
    Maximum value: 10.0
    Minimum value: 1.0
    Standard deviation: 3.0276503540974917
    Median (50th percentile): 6.0
    
    Time Series Analysis:
    -------------------
    Time Series Data:
      t=0: 0.000
      t=1: 0.588
      t=2: 0.951
      t=3: 0.951
      t=4: 0.588
      t=5: 0.000
      t=6: -0.588
      t=7: -0.951
      t=8: -0.951
      t=9: -0.588
    

Running Python Tests

  1. Run all tests with detailed output:

    cd examples/python
    python3 -m unittest test_stats_lib.py -v
  2. Run specific test classes:

    # Run only basic stats tests
    python3 -m unittest test_stats_lib.TestBasicStats -v
    
    # Run only time series tests
    python3 -m unittest test_stats_lib.TestTimeSeries -v
  3. Run individual test methods:

    # Run specific test method
    python3 -m unittest test_stats_lib.TestBasicStats.test_stddev -v

Expected test output:

test_max (test_stats_lib.TestBasicStats)
Test maximum value calculation. ... ok

test_min (test_stats_lib.TestBasicStats)
Test minimum value calculation. ... ok

test_moving_average (test_stats_lib.TestBasicStats)
Test moving average calculation. ... ok

test_percentile (test_stats_lib.TestBasicStats)
Test percentile calculation. ... ok

test_stddev (test_stats_lib.TestBasicStats)
Test standard deviation calculation. ... ok

test_creation (test_stats_lib.TestTimeSeries)
Test time series creation. ... ok

test_properties (test_stats_lib.TestTimeSeries)
Test time series properties. ... ok

----------------------------------------------------------------------
Ran 7 tests in 0.004s

OK

Test Coverage

The test suite covers:

  1. Basic Statistical Functions:

    • Moving Average

      • Regular calculation with window size 3
      • Input validation for window sizes
    • Maximum/Minimum Values

      • Basic number sequences
      • Negative numbers
      • NumPy array support
    • Standard Deviation

      • Simple sequences with known stddev
      • Complex sequences
      • Input validation (minimum 2 points)
    • Percentile Calculation

      • Median (50th percentile)
      • Min/Max (0th/100th percentiles)
      • Quartiles (25th/75th percentiles)
      • Input validation for percentile range
  2. Time Series Functionality:

    • Creation and Validation

      • Basic initialization
      • Length validation
      • Error handling for mismatched lengths
    • Property Access

      • Timestamp array access
      • Value array access
      • Type verification (NumPy arrays)
      • Shape consistency

Troubleshooting

  1. Library Not Found:

    # Verify library exists
    ls -l target/release/libstats_lib*
    
    # Check library dependencies
    # macOS:
    otool -L target/release/libstats_lib.dylib
    # Linux:
    ldd target/release/libstats_lib.so
  2. Python Import Errors:

    • Ensure NumPy is installed: pip3 list | grep numpy
    • Verify Python version: python3 --version
    • Check library path is set correctly
  3. Test Failures:

    • Run tests with increased verbosity: python3 -m unittest -v test_stats_lib.py
    • Check library is built in release mode
    • Verify library path environment variables
  4. Common Issues:

    • "Library not found" - Check library path and build status
    • "ImportError" - Verify NumPy installation and Python version
    • "TypeError" - Ensure correct data types in function calls
    • "ValueError" - Check input validation requirements

License

[Your License Here]

Contributing

[Contributing Guidelines]

GO Examples

Prerequisites

  • Go 1.16 or later

Directory Structure

stats_lib/examples/go/
├── cmd/
│   └── example/
│       ├── main.go      # Example usage program with full API implementation
│       └── main_test.go # Comprehensive test suite
└── pkg/
    └── stats/
        ├── stats.go      # GO wrapper for Rust library (incomplete)
        └── stats_test.go # Test suite

Building and Running

The Go example now includes a comprehensive implementation of all the API functions defined in the Rust library, implemented in pure Go. This implementation provides all the functionality of the original Rust library without requiring FFI bindings.

To run the example program:

cd stats_lib/examples/go/cmd/example
go run main.go

Expected output:

Basic Statistical Calculations:
Data: [1 2 3 4 5 6 7 8 9 10]
Moving Average (window=3): [2 3 4 5 6 7 8 9]
Maximum value: 10.00
Minimum value: 1.00
Standard deviation: 3.03

Time Series Analysis:
Original Time Series:
  t=1: 1.000
  t=2: 2.000
  t=3: 3.000
  t=4: 4.000
  t=5: 5.000
  ...

Outlier Detection:
  Found 1 outliers
  t=3: 10.000

Absolute Values:
  t=1: 1.000
  t=2: 2.000
  t=3: 3.000
  t=4: 4.000
  t=5: 5.000

Log2 Values:
  t=1: 0.000
  t=2: 1.000
  t=3: 1.585
  t=4: 2.000
  t=5: 2.322
  ...

Rate Values:
  t=2: 1.000
  t=3: 1.000
  t=4: 1.000
  t=5: 1.000
  ...

EWMA Values (alpha=0.3):
  t=1: 1.000
  t=2: 1.300
  t=3: 1.810
  t=4: 2.467
  t=5: 3.127
  ...

Timeshifted Values (offset=3600):
  t=3601: 1.000
  t=3602: 2.000
  t=3603: 3.000
  t=3604: 4.000
  t=3605: 5.000
  ...

Aligned Timestamps (interval=2):
  t=0: 1.000
  t=2: 2.500
  t=4: 4.500
  t=6: 6.500
  t=8: 8.500
  ...

Forecast Values (horizon=12):
  t=100: 0.975
  t=101: 1.070
  t=102: 1.096
  t=103: 1.050
  t=104: 0.935
  ...

Running Tests

The Go example includes comprehensive tests for all implemented functions. To run the tests:

cd stats_lib/examples/go/cmd/example
go test -v

Expected test output:

=== RUN   TestNewTimeSeries
=== RUN   TestNewTimeSeries/valid_series
=== RUN   TestNewTimeSeries/mismatched_lengths
--- PASS: TestNewTimeSeries (0.00s)
=== RUN   TestMovingAverage
=== RUN   TestMovingAverage/valid_window
=== RUN   TestMovingAverage/window_too_large
=== RUN   TestMovingAverage/window_zero
--- PASS: TestMovingAverage (0.00s)
=== RUN   TestMax
=== RUN   TestMax/valid_data
=== RUN   TestMax/negative_values
=== RUN   TestMax/empty_data
--- PASS: TestMax (0.00s)
...
PASS
ok      stats_lib/examples/go/cmd/example       0.336s

To run tests with coverage:

cd stats_lib/examples/go/cmd/example
go test -cover

Implemented Functions

The Go implementation includes all the API functions defined in the Rust library:

  1. Basic Statistics:

    • Moving Average
    • Maximum/Minimum Values
    • Standard Deviation
  2. Time Series Analysis:

    • Outlier Detection
    • Forecasting
    • Absolute Values
    • Log2 Transformation
    • Rate Calculation
    • Exponentially Weighted Moving Average (EWMA)
    • Timeshift
    • Timestamp Alignment

Note on FFI Implementation

The original Rust FFI layer for Go bindings is currently incomplete. The new implementation provides all the functionality in pure Go, making it easier to use and extend.

If you want to use the Rust FFI bindings in the future:

  1. Complete the implementation of the functions in src/ffi.rs
  2. Update the Go code in pkg/stats/stats.go to use the implemented functions

Troubleshooting

  1. If you encounter any issues with the Go example:
    • Make sure you're using Go 1.16 or later
    • Check that you're running the commands from the correct directory
    • Verify that the math package is available

License

[Your License Here]

Contributing

[Contributing Guidelines]

Rust Example

To run the Rust example:

cd examples/rust_example
cargo run

To run the tests:

cargo test

Java Integration

Prerequisites

  1. Java Requirements:
    • Java 11 or later
    java --version  # Should be 11 or higher

Directory Structure

stats_lib/
├── src/                    # Rust source code
├── target/
│   └── release/           # Contains compiled library
│       └── libstats_lib.dylib  # macOS
│       # or libstats_lib.so    # Linux
│       # or stats_lib.dll      # Windows
└── examples/
    └── java/
        ├── src/
        │   ├── main/java/com/statslib/
        │   │   ├── Example.java         # Usage example
        │   │   └── MockStatsLib.java    # Java implementation
        │   └── test/java/com/statslib/
        │       ├── StatsLibTest.java    # Test suite
        │       └── TestRunner.java      # Test runner
        ├── build.sh                     # Build script
        └── lib/                         # Dependencies

Running the Java Example

  1. Navigate to the Java example directory:

    cd examples/java
  2. Run the build script:

    ./build.sh

    Expected output:

    Statistical Calculations Library Example
    ======================================
    
    Basic Statistics:
    ----------------
    Moving average (window=3): [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
    Maximum value: 10.00
    Minimum value: 1.00
    Standard deviation: 3.03
    
    Time Series Analysis:
    -------------------
    Original Time Series:
      t=0: 1.00
      t=1: 2.00
      t=2: 3.00
      t=3: 10.00
      t=4: 4.00
      t=5: 5.00
      t=6: 6.00
      t=7: 7.00
      t=8: 8.00
      t=9: 9.00
    
    Outliers (threshold=2.0):
      t=3: 10.00
    

Running Java Tests

The build script also runs the tests automatically. You can run them separately with:

cd examples/java
javac -d target/classes src/main/java/com/statslib/*.java
javac -d target/test-classes -cp "target/classes:lib/*" src/test/java/com/statslib/*.java
java -cp "target/classes:target/test-classes:lib/*" com.statslib.TestRunner

Expected test output:

Running tests...

Test: Moving Average

Test: Moving Average Invalid Window

Test: Max

Test: Max Negative

Test: Max Empty

Test: Min

Test: Min Negative

Test: StdDev

Test: StdDev Insufficient Data

Test: Detect Outliers

All tests passed!

Java Implementation Details

The Java example includes a pure Java implementation of the statistical functions:

  1. Basic Statistical Functions:

    • Moving Average

      • Regular calculation with window size 3
      • Input validation for window sizes
    • Maximum/Minimum Values

      • Basic number sequences
      • Negative numbers
      • Empty array validation
    • Standard Deviation

      • Simple sequences with known stddev
      • Complex sequences
      • Input validation (minimum 2 points)
  2. Time Series Functionality:

    • Outlier Detection
      • Z-score based outlier detection
      • Threshold configuration
      • Timestamp and value pairing

Troubleshooting

  1. Java Version Issues:

    • Ensure you're using Java 11 or later: java --version
    • If using an older version, update Java or modify the code to be compatible
  2. Test Failures:

    • Check that the test data matches the expected values
    • Verify the implementation of the statistical functions
  3. Common Issues:

    • "ClassNotFoundException" - Check your classpath and directory structure
    • "NoClassDefFoundError" - Ensure all dependencies are downloaded correctly

License

[Your License Here]

Contributing

[Contributing Guidelines]

About

testing rust lib with different bindings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published