A Ruby gem that extends Enumerable with statistical methods, making it easy to calculate descriptive statistics and detect outliers in your data collections.
Add this line to your application's Gemfile:
gem 'enumerable-stats'
And then execute:
bundle install
Or install it yourself as:
gem install enumerable-stats
Simply require the gem and all Enumerable objects (Arrays, Ranges, Sets, etc.) will have the statistical methods available.
require 'enumerable-stats'
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
puts data.mean # => 5.5
puts data.median # => 5.5
puts data.variance # => 9.17
Calculates the arithmetic mean (average) of the collection.
[1, 2, 3, 4, 5].mean # => 3.0
[10, 20, 30].mean # => 20.0
[-1, 0, 1].mean # => 0.0
Calculates the median (middle value) of the collection.
[1, 2, 3, 4, 5].median # => 3 (odd number of elements)
[1, 2, 3, 4].median # => 2.5 (even number of elements)
[5, 1, 3, 2, 4].median # => 3 (automatically sorts)
[].median # => nil (empty collection)
Calculates the specified percentile of the collection using linear interpolation. This is equivalent to the "linear" method used by many statistical software packages (R-7/Excel method).
# Basic percentile calculations
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data.percentile(0) # => 1 (minimum value)
data.percentile(25) # => 3.25 (first quartile)
data.percentile(50) # => 5.5 (median)
data.percentile(75) # => 7.75 (third quartile)
data.percentile(100) # => 10 (maximum value)
# Performance monitoring percentiles
response_times = [45, 52, 48, 51, 49, 47, 53, 46, 50, 54]
p50 = response_times.percentile(50) # => 49.5ms (median response time)
p95 = response_times.percentile(95) # => 53.8ms (95th percentile - outlier threshold)
p99 = response_times.percentile(99) # => 53.96ms (99th percentile - extreme outliers)
puts "50% of requests complete within #{p50}ms"
puts "95% of requests complete within #{p95}ms"
puts "99% of requests complete within #{p99}ms"
# Works with any numeric data
scores = [78, 85, 92, 88, 76, 94, 82, 89, 91, 87]
puts "Top 10% threshold: #{scores.percentile(90)}" # => 92.4
puts "Bottom 25% cutoff: #{scores.percentile(25)}" # => 80.5
Calculates the sample variance of the collection.
[1, 2, 3, 4, 5].variance # => 2.5
[5, 5, 5, 5].variance # => 0.0 (no variation)
Calculates the sample standard deviation (square root of variance).
[1, 2, 3, 4, 5].standard_deviation # => 1.58
[5, 5, 5, 5].standard_deviation # => 0.0
Calculates the t-statistic for comparing the means of two samples using Welch's t-test formula, which doesn't assume equal variances. Used in hypothesis testing to determine if two groups have significantly different means.
# A/B test: comparing conversion rates
control_group = [0.12, 0.11, 0.13, 0.12, 0.14, 0.11, 0.12] # mean ≈ 0.121
test_group = [0.15, 0.16, 0.14, 0.17, 0.15, 0.18, 0.16] # mean ≈ 0.157
t_stat = control_group.t_value(test_group)
puts t_stat # => -4.2 (negative means test_group > control_group)
# Performance comparison: API response times
baseline = [100, 120, 110, 105, 115, 108, 112] # Slower responses
optimized = [85, 95, 90, 88, 92, 87, 89] # Faster responses
t_stat = baseline.t_value(optimized)
puts t_stat # => 5.8 (positive means baseline > optimized, which is bad for response times)
# The larger the absolute t-value, the more significant the difference
puts "Significant difference!" if t_stat.abs > 2.0 # Rule of thumb threshold
Calculates the degrees of freedom for statistical testing using Welch's formula. This accounts for different sample sizes and variances between groups and is used alongside the t-statistic for hypothesis testing.
# Calculate degrees of freedom for the same samples
control = [0.12, 0.11, 0.13, 0.12, 0.14, 0.11, 0.12]
test = [0.15, 0.16, 0.14, 0.17, 0.15, 0.18, 0.16]
df = control.degrees_of_freedom(test)
puts df # => ~11.8 (used to look up critical t-values in statistical tables)
# With equal variances, approaches n1 + n2 - 2
equal_var_a = [10, 11, 12, 13, 14] # variance = 2.5
equal_var_b = [15, 16, 17, 18, 19] # variance = 2.5
df_equal = equal_var_a.degrees_of_freedom(equal_var_b)
puts df_equal # => ~8.0 (close to 5 + 5 - 2 = 8)
# With unequal variances, will be less than pooled degrees of freedom
unequal_a = [10, 10, 10, 10, 10] # very low variance
unequal_b = [5, 15, 8, 20, 12, 25, 18] # high variance
df_unequal = unequal_a.degrees_of_freedom(unequal_b)
puts df_unequal # => ~6.2 (much less than 5 + 7 - 2 = 10)
Tests if this collection's mean is significantly greater than another collection's mean using a one-tailed Student's t-test. Returns true
if the difference is statistically significant at the specified alpha level.
# A/B testing: is the new feature performing better?
control_conversion = [0.118, 0.124, 0.116, 0.121, 0.119, 0.122, 0.117] # ~12.0% avg
variant_conversion = [0.135, 0.142, 0.138, 0.144, 0.140, 0.136, 0.139] # ~13.9% avg
# Is the variant significantly better than control?
puts variant_conversion.greater_than?(control_conversion) # => true (significant improvement)
puts control_conversion.greater_than?(variant_conversion) # => false
# Performance testing: is new optimization significantly faster?
old_response_times = [145, 152, 148, 159, 143, 156, 147, 151, 149, 154] # ~150ms avg
new_response_times = [125, 128, 122, 131, 124, 129, 126, 130, 123, 127] # ~126ms avg
# Are old times significantly greater (worse) than new times?
puts old_response_times.greater_than?(new_response_times) # => true (significant improvement)
# Custom significance level for more conservative testing
puts variant_conversion.greater_than?(control_conversion, alpha: 0.01) # 99% confidence
puts variant_conversion.greater_than?(control_conversion, alpha: 0.10) # 90% confidence
# Check with similar groups (should be false)
similar_a = [10, 11, 12, 13, 14]
similar_b = [10.5, 11.5, 12.5, 13.5, 14.5]
puts similar_b.greater_than?(similar_a) # => false (difference not significant)
Tests if this collection's mean is significantly less than another collection's mean using a one-tailed Student's t-test. Returns true
if the difference is statistically significant at the specified alpha level.
# Response time improvement: are new times significantly lower?
baseline_times = [150, 165, 155, 170, 160, 145, 175, 152, 158, 163] # ~159ms avg
optimized_times = [120, 125, 115, 130, 118, 122, 128, 124, 119, 126] # ~123ms avg
# Are optimized times significantly less (better) than baseline?
puts optimized_times.less_than?(baseline_times) # => true (significant improvement)
puts baseline_times.less_than?(optimized_times) # => false
# Error rate reduction: is new implementation significantly better?
old_error_rates = [0.025, 0.028, 0.024, 0.030, 0.026, 0.027, 0.029] # ~2.7% avg
new_error_rates = [0.012, 0.015, 0.013, 0.016, 0.014, 0.011, 0.013] # ~1.3% avg
puts new_error_rates.less_than?(old_error_rates) # => true (significantly fewer errors)
# Memory usage optimization
before_optimization = [245, 250, 242, 255, 248, 253, 247] # ~248MB avg
after_optimization = [198, 205, 195, 210, 200, 202, 197] # ~201MB avg
puts after_optimization.less_than?(before_optimization) # => true (significant reduction)
# Custom alpha levels
puts optimized_times.less_than?(baseline_times, alpha: 0.01) # More stringent test
puts optimized_times.less_than?(baseline_times, alpha: 0.10) # More lenient test
The gem provides convenient operator shortcuts for statistical comparisons:
Shorthand operators for greater_than?
and less_than?
respectively.
# Performance comparison using operators
baseline_times = [150, 165, 155, 170, 160, 145, 175]
optimized_times = [120, 125, 115, 130, 118, 122, 128]
# These are equivalent:
puts baseline_times.greater_than?(optimized_times) # => true
puts baseline_times > optimized_times # => true
puts optimized_times.less_than?(baseline_times) # => true
puts optimized_times < baseline_times # => true
# With custom alpha levels (use explicit method syntax for parameters)
puts baseline_times.>(optimized_times, alpha: 0.01) # More stringent
puts optimized_times.<(baseline_times, alpha: 0.10) # More lenient
The spaceship operator provides three-way statistical comparison, returning:
1
if this collection's mean is significantly greater than the other-1
if this collection's mean is significantly less than the other0
if there's no significant statistical difference
This is particularly useful for sorting collections by statistical significance or implementing custom comparison logic.
# Three-way statistical comparison
high_performance = [200, 210, 205, 215, 220] # mean = 210
medium_performance = [150, 160, 155, 165, 170] # mean = 160
low_performance = [50, 60, 55, 65, 70] # mean = 60
puts high_performance <=> medium_performance # => 1 (significantly greater)
puts medium_performance <=> high_performance # => -1 (significantly less)
puts high_performance <=> high_performance # => 0 (no significant difference)
# Sorting datasets by statistical significance
datasets = [
[10, 15, 12, 18, 11], # mean = 13.2
[30, 35, 32, 38, 31], # mean = 33.2
[5, 8, 6, 9, 7], # mean = 7.0
[20, 25, 22, 28, 21] # mean = 23.2
]
# Sort datasets from lowest to highest statistical significance
sorted_datasets = datasets.sort { |a, b| a <=> b }
puts sorted_datasets.map(&:mean) # => [7.0, 13.2, 23.2, 33.2] (ascending by mean)
# A/B testing - sort variants by conversion performance
control = [0.12, 0.11, 0.13, 0.12, 0.10] # 11.6% conversion
variant_a = [0.14, 0.13, 0.15, 0.14, 0.12] # 13.6% conversion
variant_b = [0.16, 0.15, 0.17, 0.16, 0.14] # 15.6% conversion
variants = [control, variant_a, variant_b]
best_to_worst = variants.sort { |a, b| b <=> a } # Descending order
puts "Performance ranking:"
best_to_worst.each_with_index do |variant, index|
puts "#{index + 1}. Mean conversion: #{variant.mean.round(3)}"
end
# Custom alpha levels (default is 0.05)
borderline_a = [100, 102, 104, 106, 108] # mean = 104
borderline_b = [95, 97, 99, 101, 103] # mean = 99
# Standard significance test (95% confidence)
result_standard = borderline_a <=> borderline_b
puts "Standard test (α=0.05): #{result_standard}"
# More lenient test (90% confidence)
# Note: Use method call syntax for custom parameters
result_lenient = borderline_a.public_send(:<=>, borderline_b, alpha: 0.10)
puts "Lenient test (α=0.10): #{result_lenient}"
Calculates the absolute percentage difference between this collection's mean and another value or collection's mean using the symmetric percentage difference formula.
# Comparing two datasets
control_group = [85, 90, 88, 92, 85] # mean = 88
test_group = [95, 98, 94, 96, 97] # mean = 96
diff = control_group.percentage_difference(test_group)
puts diff # => 8.7% (always positive)
# Comparing collection to single value
response_times = [120, 135, 125, 130, 140] # mean = 130
target = 120
diff = response_times.percentage_difference(target)
puts diff # => 8.0%
# Same result regardless of order
puts control_group.percentage_difference(test_group) # => 8.7%
puts test_group.percentage_difference(control_group) # => 8.7%
Calculates the signed percentage difference, preserving the direction of change. Positive values indicate this collection's mean is higher than the comparison; negative values indicate it's lower.
# Performance monitoring - lower is better
baseline = [100, 110, 105, 115, 95] # mean = 105ms
optimized = [85, 95, 90, 100, 80] # mean = 90ms
improvement = optimized.signed_percentage_difference(baseline)
puts improvement # => -15.38% (negative = improvement for response times)
regression = baseline.signed_percentage_difference(optimized)
puts regression # => 15.38% (positive = regression)
# A/B testing - higher is better
control_conversions = [0.12, 0.11, 0.13, 0.12] # mean = 0.12 (12%)
variant_conversions = [0.14, 0.13, 0.15, 0.14] # mean = 0.14 (14%)
lift = variant_conversions.signed_percentage_difference(control_conversions)
puts lift # => 15.38% (positive = improvement for conversion rates)
Removes outliers using the IQR (Interquartile Range) method. This is particularly effective for performance data which often has extreme values due to network issues, CPU scheduling, GC pauses, etc.
# Basic usage
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 100] # 100 is an outlier
clean_data = data.remove_outliers
# => [1, 2, 3, 4, 5, 6, 7, 8, 9] (outlier removed)
# Custom multiplier (more conservative = fewer outliers removed)
data.remove_outliers(multiplier: 2.0) # Less aggressive
data.remove_outliers(multiplier: 1.0) # More aggressive
# Performance data example
response_times = [45, 52, 48, 51, 49, 47, 53, 46, 2000, 48] # 2000ms is an outlier
clean_times = response_times.remove_outliers
# => [45, 52, 48, 51, 49, 47, 53, 46, 48]
Note: Collections with fewer than 4 elements are returned unchanged since quartile calculation requires at least 4 data points.
Returns detailed statistics about outlier removal for debugging and logging purposes.
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 100]
stats = data.outlier_stats
puts stats
# => {
# original_count: 10,
# filtered_count: 9,
# outliers_removed: 1,
# outlier_percentage: 10.0
# }
The gem works with any Enumerable object:
# Arrays
[1, 2, 3, 4, 5].mean # => 3.0
# Ranges
(1..10).median # => 5.5
# Sets
require 'set'
Set.new([1, 2, 3, 3, 4]).variance # => 1.67 (duplicates ignored)
# Custom Enumerable objects
class DataSet
include Enumerable
def initialize(data)
@data = data
end
def each(&block)
@data.each(&block)
end
end
dataset = DataSet.new([10, 20, 30, 40, 50])
dataset.standard_deviation # => 15.81
# Analyzing API response times
response_times = [120, 145, 133, 128, 142, 136, 5000, 125, 139, 131]
puts "Original mean: #{response_times.mean.round(2)}ms"
# => "Original mean: 619.9ms" (skewed by the 5000ms outlier)
clean_times = response_times.remove_outliers
puts "Clean mean: #{clean_times.mean.round(2)}ms"
# => "Clean mean: 133.22ms" (more representative)
# Use percentiles for SLA monitoring (industry standard approach)
p50 = clean_times.percentile(50) # Median response time
p95 = clean_times.percentile(95) # 95% of requests complete within this time
p99 = clean_times.percentile(99) # 99% of requests complete within this time
puts "Response Time SLAs:"
puts " p50 (median): #{p50.round(1)}ms"
puts " p95: #{p95.round(1)}ms"
puts " p99: #{p99.round(1)}ms"
# Set alerting thresholds based on percentiles
sla_p95_threshold = 200 # ms
if p95 > sla_p95_threshold
puts "🚨 SLA BREACH: 95th percentile (#{p95.round(1)}ms) exceeds #{sla_p95_threshold}ms"
else
puts "✅ SLA OK: 95th percentile within acceptable limits"
end
# Get outlier statistics for monitoring
stats = response_times.outlier_stats
puts "Removed #{stats[:outliers_removed]} outliers (#{stats[:outlier_percentage]}%)"
# => "Removed 1 outliers (10.0%)"
# Analyzing sensor readings
temperatures = [22.1, 22.3, 22.0, 22.2, 89.5, 22.1, 22.4] # 89.5 is likely an error
puts "Raw data statistics:"
puts " Mean: #{temperatures.mean.round(2)}°C"
puts " Std Dev: #{temperatures.standard_deviation.round(2)}°C"
clean_temps = temperatures.remove_outliers
puts "\nCleaned data statistics:"
puts " Mean: #{clean_temps.mean.round(2)}°C"
puts " Std Dev: #{clean_temps.standard_deviation.round(2)}°C"
puts " Sample size: #{clean_temps.size}/#{temperatures.size}"
# Conversion rates for multiple variants
control = [0.12, 0.15, 0.11, 0.14, 0.13, 0.16, 0.12, 0.15] # 13.5% avg conversion
variant_a = [0.18, 0.19, 0.17, 0.20, 0.18, 0.21, 0.19, 0.18] # 18.75% avg conversion
variant_b = [0.16, 0.17, 0.15, 0.18, 0.16, 0.19, 0.17, 0.16] # 16.75% avg conversion
variant_c = [0.22, 0.24, 0.21, 0.25, 0.23, 0.26, 0.22, 0.24] # 23.4% avg conversion
variants = [
{ name: "Control", data: control },
{ name: "Variant A", data: variant_a },
{ name: "Variant B", data: variant_b },
{ name: "Variant C", data: variant_c }
]
# Display individual performance
variants.each do |variant|
mean_pct = (variant[:data].mean * 100).round(1)
std_pct = (variant[:data].standard_deviation * 100).round(1)
puts "#{variant[:name]}: #{mean_pct}% ± #{std_pct}%"
end
# Sort variants by statistical performance using spaceship operator
sorted_variants = variants.sort { |a, b| b[:data] <=> a[:data] } # Descending order
puts "\nPerformance Ranking (statistically significant):"
sorted_variants.each_with_index do |variant, index|
conversion_rate = (variant[:data].mean * 100).round(1)
puts "#{index + 1}. #{variant[:name]}: #{conversion_rate}%"
# Compare to control using statistical significance
if variant[:name] != "Control"
is_significantly_better = variant[:data] > control
puts " #{is_significantly_better ? '✅ Significantly better' : '❌ Not significantly different'} than control"
end
end
# Check for outliers that might skew results
puts "\nOutlier Analysis:"
variants.each do |variant|
outlier_count = variant[:data].outlier_stats[:outliers_removed]
puts "#{variant[:name]} outliers: #{outlier_count}"
end
# Before and after optimization comparison
before_optimization = [150, 165, 155, 170, 160, 145, 175] # API response times (ms)
after_optimization = [120, 125, 115, 130, 118, 122, 128]
puts "Before: #{before_optimization.mean.round(1)}ms ± #{before_optimization.standard_deviation.round(1)}ms"
puts "After: #{after_optimization.mean.round(1)}ms ± #{after_optimization.standard_deviation.round(1)}ms"
# Calculate improvement (negative is good for response times)
improvement = after_optimization.signed_percentage_difference(before_optimization)
puts "Performance improvement: #{improvement.round(1)}%" # => "Performance improvement: -23.2%"
# Or use absolute difference for reporting
abs_diff = after_optimization.percentage_difference(before_optimization)
puts "Total performance change: #{abs_diff.round(1)}%" # => "Total performance change: 23.2%"
# Comparing two datasets for meaningful differences
dataset_a = [45, 50, 48, 52, 49, 47, 51]
dataset_b = [48, 53, 50, 55, 52, 49, 54]
# Basic comparison
difference = dataset_b.signed_percentage_difference(dataset_a)
puts "Dataset B is #{difference.round(1)}% different from Dataset A"
# Check if difference is large enough to be meaningful
abs_difference = dataset_b.percentage_difference(dataset_a)
if abs_difference > 5.0 # 5% threshold
puts "Difference of #{abs_difference.round(1)}% may be statistically significant"
else
puts "Difference of #{abs_difference.round(1)}% is likely not significant"
end
# Consider variability
a_cv = (dataset_a.standard_deviation / dataset_a.mean) * 100 # Coefficient of variation
b_cv = (dataset_b.standard_deviation / dataset_b.mean) * 100
puts "Dataset A variability: #{a_cv.round(1)}%"
puts "Dataset B variability: #{b_cv.round(1)}%"
# Complete example: A/B testing with proper statistical analysis
# Testing whether a new checkout flow improves conversion rates
# Conversion rate data (percentages converted to decimals)
control_conversions = [0.118, 0.124, 0.116, 0.121, 0.119, 0.122, 0.117, 0.120, 0.115, 0.123]
variant_conversions = [0.135, 0.142, 0.138, 0.144, 0.140, 0.136, 0.139, 0.141, 0.137, 0.143]
puts "=== A/B Test Statistical Analysis ==="
puts "Control group (n=#{control_conversions.count}):"
puts " Mean: #{(control_conversions.mean * 100).round(2)}%"
puts " Std Dev: #{(control_conversions.standard_deviation * 100).round(3)}%"
puts "Variant group (n=#{variant_conversions.count}):"
puts " Mean: #{(variant_conversions.mean * 100).round(2)}%"
puts " Std Dev: #{(variant_conversions.standard_deviation * 100).round(3)}%"
# Calculate effect size
lift = variant_conversions.signed_percentage_difference(control_conversions)
puts "\nEffect size: #{lift.round(2)}% lift"
# Perform statistical test
t_statistic = control_conversions.t_value(variant_conversions)
degrees_freedom = control_conversions.degrees_of_freedom(variant_conversions)
puts "\nStatistical test results:"
puts " t-statistic: #{t_statistic.round(3)}"
puts " Degrees of freedom: #{degrees_freedom.round(1)}"
puts " |t| = #{t_statistic.abs.round(3)}"
# Interpret results (simplified - in real analysis, use proper p-value lookup)
if t_statistic.abs > 2.0 # Rough threshold for significance
significance = t_statistic.abs > 3.0 ? "highly significant" : "significant"
direction = t_statistic < 0 ? "Variant is better" : "Control is better"
puts " Result: #{significance} difference detected"
puts " Conclusion: #{direction}"
else
puts " Result: No significant difference detected"
puts " Conclusion: Insufficient evidence for a difference"
end
# Data quality checks
control_outliers = control_conversions.outlier_stats
variant_outliers = variant_conversions.outlier_stats
puts "\nData quality:"
puts " Control outliers: #{control_outliers[:outliers_removed]}/#{control_outliers[:original_count]}"
puts " Variant outliers: #{variant_outliers[:outliers_removed]}/#{variant_outliers[:original_count]}"
if control_outliers[:outliers_removed] > 0 || variant_outliers[:outliers_removed] > 0
puts " ⚠️ Consider investigating outliers before concluding"
end
# Monitor API performance changes after deployment
# Compare response times before and after optimization
before_deploy = [145, 152, 148, 159, 143, 156, 147, 151, 149, 154,
146, 158, 150, 153, 144, 157, 148, 152, 147, 155]
after_deploy = [132, 128, 135, 130, 133, 129, 131, 134, 127, 136,
133, 130, 128, 135, 132, 129, 134, 131, 130, 133]
puts "=== Performance Monitoring Analysis ==="
# Remove outliers for more accurate comparison
before_clean = before_deploy.remove_outliers
after_clean = after_deploy.remove_outliers
puts "Before deployment (cleaned): #{before_clean.mean.round(1)}ms ± #{before_clean.standard_deviation.round(1)}ms"
puts "After deployment (cleaned): #{after_clean.mean.round(1)}ms ± #{after_clean.standard_deviation.round(1)}ms"
# Calculate improvement
improvement_pct = after_clean.signed_percentage_difference(before_clean)
improvement_abs = before_clean.mean - after_clean.mean
puts "Improvement: #{improvement_pct.round(1)}% (#{improvement_abs.round(1)}ms faster)"
# Statistical significance test
t_stat = before_clean.t_value(after_clean)
df = before_clean.degrees_of_freedom(after_clean)
puts "Statistical test: t(#{df.round(1)}) = #{t_stat.round(3)}"
if t_stat.abs > 2.5 # Conservative threshold for production changes
puts "✅ Statistically significant improvement confirmed"
puts " Safe to keep the optimization"
else
puts "⚠️ Improvement not statistically significant"
puts " Consider longer observation period"
end
# Monitor for performance regression alerts
alert_threshold = 2.0 # t-statistic threshold for alerts
if t_stat < -alert_threshold # Negative means after > before (regression)
puts "🚨 PERFORMANCE REGRESSION DETECTED!"
puts " Immediate investigation recommended"
end
Method | Description | Returns | Notes |
---|---|---|---|
mean |
Arithmetic mean | Float | Works with any numeric collection |
median |
Middle value | Numeric or nil | Returns nil for empty collections |
percentile(percentile) |
Value at specified percentile (0-100) | Numeric or nil | Uses linear interpolation, R-7/Excel method |
variance |
Sample variance | Float | Uses n-1 denominator (sample variance) |
standard_deviation |
Sample standard deviation | Float | Square root of variance |
t_value(other) |
T-statistic for hypothesis testing | Float | Uses Welch's t-test, handles unequal variances |
degrees_of_freedom(other) |
Degrees of freedom for t-test | Float | Uses Welch's formula, accounts for unequal variances |
greater_than?(other, alpha: 0.05) |
Test if mean is significantly greater | Boolean | One-tailed t-test, customizable alpha level |
less_than?(other, alpha: 0.05) |
Test if mean is significantly less | Boolean | One-tailed t-test, customizable alpha level |
>(other, alpha: 0.05) |
Alias for greater_than? |
Boolean | Shorthand operator for statistical comparison |
<(other, alpha: 0.05) |
Alias for less_than? |
Boolean | Shorthand operator for statistical comparison |
<=>(other, alpha: 0.05) |
Three-way statistical comparison | Integer (-1, 0, 1) | Returns 1 if greater, -1 if less, 0 if no significant difference |
percentage_difference(other) |
Absolute percentage difference | Float | Always positive, symmetric comparison |
signed_percentage_difference(other) |
Signed percentage difference | Float | Preserves direction, useful for A/B tests |
remove_outliers(multiplier: 1.5) |
Remove outliers using IQR method | Array | Returns new array, original unchanged |
outlier_stats(multiplier: 1.5) |
Outlier removal statistics | Hash | Useful for monitoring and debugging |
- Ruby >= 3.1.0
- No external dependencies
After checking out the repo, run:
bundle install
bundle exec rspec # Run the tests
Bug reports and pull requests are welcome on GitHub at https://github.com/binarycleric/enumerable-stats.
Tags a new version and pushes it to GitHub.
bundle exec rake release
The gem is available as open source under the terms of the MIT License.