Releases: awslabs/deequ
Releases · awslabs/deequ
2.0.12
What's Changed
-
Added Implementation of DQDL Rules and Execution
- add implementation of DQDL rule execution by @happy-coral in #620
- Add implementation of outcome mapping in DeequOutcomeTranslator by @happy-coral in #621
- Add implementation for DQDL rules: CompletenessRule, IsCompleteRule, UniquenessRule, IsUniqueRule, ColumnCorrelationRule by @happy-coral in #622
- Add implementation for DQDL rules: DistinctValuesCount, Entropy, Mean, StandardDeviation, Sum, UniqueValueRatio by @happy-coral in #624
- Update README to describe DQDL support and add Java & Scala DQDL examples by @happy-coral in #634
- Add support for DQDL IsPrimaryKey rule by @happy-coral in #635
- Add support for DQDL ColumnLength rule by @eycho-am in #636
-
Modify Histogram to be in descending frequency by @kyraman in #630
-
Introduce HistogramBase for common histogram behavior by @kyraman in #631
-
Modify maven publishing to use central portal by @eycho-am in #633
-
Add support for DQDL CustomSql rule & Deequ CustomSql check by @happy-coral in #632
-
fix(kll): Add SerDe Implementation for KLLSketch by @mdrakiburrahman in #628
-
Updated version in pom.xml to 2.0.12-spark-3.5 by @eycho-am in #637
New Contributors
- @kyraman made their first contribution in #630
- @mdrakiburrahman made their first contribution in #628
Full Changelog: 2.0.11...2.0.12
2.0.11
What's Changed
- Add AnalyzerOptions to Analyzer serialize / deserialize logic by @kchaturvedi in #597
- Refine row count retrieval to skip redundant Size() scans by @lawofcycles in #605
- Updated version in pom.xml to 2.0.11-spark-3.5 by @eycho-am in #615
New Contributors
- @kchaturvedi made their first contribution in #597
- @lawofcycles made their first contribution in #605
Full Changelog: 2.0.10...2.0.11
2.0.10
New Features
- Are unique check by @eycho-am in #599
- add DQDL parser dependency by @happy-coral in #603
- scaffolding for checking data quality agains DQDL rulesets by @happy-coral in #604
- Implement translation of rules and add converter for RowCount rule by @happy-coral in #606
Maintenance / Fixes
- feature/replace-rdd by @shriyavanvari in #586
- Adds a test to verify that Deequ's isContainedIn constraint correctly handles string values containing single quotes in the verification process. by @D-Minor in #602
New Contributors
- @shriyavanvari made their first contribution in #586
- @D-Minor made their first contribution in #602
- @happy-coral made their first contribution in #603
Full Changelog: 2.0.9...2.0.10
2.0.9
2.0.8
New Features
- Configurable RetainCompletenessRule by @zeotuan in #564
- Optional specification of instance name in CustomSQL analyzer metric. by @tylermcdaniel0 in #569
- Adding Wilson Score Confidence Interval Strategy by @zeotuan in #567
- CustomAggregator by @joshuazexter in #572
- Add commits from master branch to release/2.0.8-spark-3.5 by @eycho-am in #587
Maintenance / Fixes
- fix typo by @bojackli in #574
- Fix performance of building row-level results by @marcantony in #577
New Contributors
- @joshuazexter made their first contribution in #572
- @bojackli made their first contribution in #574
Full Changelog: 2.0.7...2.0.8
2.0.7
What's Changed
Upgrades
New Features
- New type of MetricsRepository by @VenkataKarthikP:
- Using Spark tables as the data source in #518
- Row Level Result Treatment Options by @eycho-am:
- Anomaly Detection Changes by @zeotuan:
- Add Daily Season with Hourly Interval to HoltWinter in #546
- New analyzers:
- RatioOfSums by @scott-gunn in #552
- Column Count Analyzer and Check by @mentekid in #555
Maintenance/Fixes
- Fix Breeze dependency conflict in Anomaly Detection Spark 3.4+ by @zeotuan in #545
- Data Sync / DatasetMatch changes by @VenkataKarthikP:
- Row level results fixes:
- Add analyzerOption to add filteredRowOutcome for isPrimaryKey Check by @eycho-am in #537
- Fix bug in MinLength and MaxLength when NullBehavior.EmptyString by @eycho-am in #538
- [Min/Max] Apply filtered row behavior at the row level evaluation by @rdsharma26 in #543
- [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation by @rdsharma26 in #547
- Fix for satisfies row level results bug by @rdsharma26 in #553
New Contributors
- @VenkataKarthikP made their first contribution in #518
- @scott-gunn made their first contribution in #552
Full Changelog: 2.0.6...2.0.7
2.0.6
What's Changed
- NEW: Exact Quantile Check
- Creation of Exact Quantile Check by @jmilis2000 in #512
- Data Synchronization/Matching fixes
- Delegate to Spark for checking existence of columns in the given dataframes by @rdsharma26 in #515
- Verify that non key columns exist in each dataset by @rdsharma26 in #517
- Addition of tests
- Test that exceptions within a check's constraints do not affect other… by @tylermcdaniel0 in #516
New Contributors
- @jmilis2000 made their first contribution in #512
- @tylermcdaniel0 made their first contribution in #516
Full Changelog: 2.0.5...2.0.6
2.0.5
What's Changed
- Spark 3.4 Update
- NEW: Custom SQL analyzer
- Analyzer Improvements
New Contributors
Full Changelog: 2.0.4...2.0.5
2.0.4
What's Changed
- Row-Level Results:
- MinLength by @eycho-am in #465
- Uniqueness by @eycho-am in #471
- ColumnValues by @zixianzh1 in #476
- ReferentialIntegrity by @rdsharma26 in #466
- [Experimental] DataSynchronization by @rdsharma26 in #473
- Referential Integrity:
- Updated Referential Integrity to support multiple columns by @rdsharma26 in #463
- Constraints and Condition Changes:
- Add population stability index (PSI) to distance methods by @bevhanno in #480
- Fix chi-square test conditions by @bevhanno in #482
- Missing Column Precondition for Compliance Check - issue fix 467 by @samarth-c1 in #478
- Addition of HasMax/HasMin/HasStandardDeviation/HasMean constraint suggestions by @rdsharma26 in #489
- Alternative aggregate functions to calculate histogram values. by @akalotkin in #475
New Contributors
- @zixianzh1 made their first contribution in #476
- @samarth-c1 made their first contribution in #478
- @akalotkin made their first contribution in #475
Full Changelog: 2.0.3...2.0.4
2.0.3
What's Changed
- Adding chi-square distance method for categorical variables by @bevhanno in #444
- [WIP] Row Level Results by @mentekid in #451
- [Experimental] Addition of dataset comparison utilities by @rdsharma26 in #449
New Contributors
- @rdsharma26 made their first contribution in #447
- @bevhanno made their first contribution in #444
- @mentekid made their first contribution in #451
Full Changelog: 2.0.2...2.0.3