Releases: datahub-project/datahub
v1.1.0rc4
What's Changed
- dataset cli - add support for schema, round-tripping to yaml by @chakru-r in #12764
- feat(ingestion/superset): ownership info for charts, dashboards and datasets by @PeteMango in #12750
- feat(ingest): allowdenypattern for dashboard, chart, dataset in superset by @kevinkarchacryl in #12782
- feat(models): adds subtypes to most entities in the model by @shirshanka in #12783
- fix: fixes mypy complaints about pkgresources by @sgomezvillamor in #12790
- fix(ingestion): fixes producing some URNs with reserved characters by @sgomezvillamor in #12772
- feat(okta): custom properties for okta user by @sgomezvillamor in #12773
- feat(mssql): adds subtypes aspect for dataflow and datajobs by @sgomezvillamor in #12775
- feat(searchBarAutocomplete): add feature flag for search bar's autocomplete redesign by @v-tarasevich-blitz-brain in #12690
- fix(ingest): enable fuzzy case resolution for oracle sql by @hsheth2 in #12778
- style: update azure.md by removing extra word by @alexbransky in #12780
- fix(ui): change tags to properties in ml model view by @yoonhyejin in #12789
- fix(ui) Fix changing color and icon for domains in UI by @chriscollins3456 in #12792
- Support container in ML Model Group, Model and Deployment by @ryota-cloud in #12793
- docs: update mlflow ingestion docs to include new concept mappings by @yoonhyejin in #12791
- fix(web) move form entity sidebar to right to align with cloud by @jayacryl in #12796
- doc(iceberg): iceberg doc updates by @chakru-r in #12787
- docs: add exporting from source to write mcp guide by @yoonhyejin in #12800
- feat(ingest/redshift): support for datashares lineage by @mayurinehate in #12660
- feat(ingestion/business-glossary): Automatically generate predictable glossary term and node URNs when incompatible URL characters are specified in term and node names. by @acrylJonny in #12673
- fix(ingestion/oracle): Improved foreign key handling by @acrylJonny in #11867
- feat(ingest/iceberg): Introduce network problems resiliency for Iceberg source by @skrydal in #12804
- chore(postgres): bump version by @david-leifker in #12808
- chore(aws): bump aws libraries by @david-leifker in #12809
- feat(api): URN, Entity, and Aspect name Async Validation by @david-leifker in #12797
- feat(ingest): improve extract-sql-agg-log command by @hsheth2 in #12803
- fix(UI): Showing platform instances only once by @sakethvarma397 in #12806
- fix: search cache invalidation for iceberg entities by @chakru-r in #12805
- feat(docs): Release for DataHub Cloud 0.3.8.2 by @pedro93 in #12811
- refactor(graphql): simplify getLastIngestionRun method by @trialiya in #12706
- docs(ingest): update metadata-ingestion dev guide by @hsheth2 in #12779
- fix(ingest/oracle): refresh golden files by @hsheth2 in #12818
- fix(openapi): fix openapi timeseries async ingestion by @david-leifker in #12812
- docs(ingest/mode): update mode workspace docs by @hsheth2 in #12774
- fix(ingestion/superset): fixed iterate over int error for building urns by @PeteMango in #12807
- fix(doc): Disable Algolia search by @treff7es in #12831
- fix(build): build improvements to help with incremental builds by @chakru-r in #12823
- feat(docs) add perms req to ai docs by @jayacryl in #12819
- Add variable to show full title in lineage by default by @Blize in #12078
- fix(doc): re-enable Algolia search by @hsheth2 in #12834
- feat(ui): support all entities with display names in browse paths v2 by @Masterchen09 in #11657
- feat(ingestion/mlflow): improve mlflow connector to pull run and experiments by @yoonhyejin in #12587
- fix(workflows): Update pr-labeler by @asikowitz in #12835
- chore(ruff): enable some ignored rules by @sgomezvillamor in #12815
- feat(ingest/redshift): lineage for external schema created from redshift by @mayurinehate in #12826
- feat(openapi-ingestion): implement openapi ingestion by @david-leifker in #12757
- fix(ui) Hide default filters we want to hide from impact analysis by @chriscollins3456 in #12843
- fix(ui) Fix submitting when selecting replacement in deprecation modal by @chriscollins3456 in #12842
- fix(UI): Multiple data product delete modals by @sakethvarma397 in #12781
- fix(graphql/search): Remove schema field and data process instance from default search types by @asikowitz in #12845
- docs: clear remote executor docs by @anshbansal in #12839
- build(deps): bump @babel/runtime from 7.24.4 to 7.26.10 in /docs-website by @dependabot in #12844
- build(deps): bump @babel/runtime-corejs3 from 7.24.4 to 7.26.10 in /docs-website by @dependabot in #12846
- build(deps): bump @babel/helpers from 7.24.4 to 7.26.10 in /docs-website by @dependabot in #12847
- fix(jaas): fix jaas login by @david-leifker in #12848
- feat(gql) allow unsetting optional incident fields by @jayacryl in #12801
- fix(ingest/dynamodb): pass env to dataset urn function by @anshbansal in #12853
- feat(models): Add edges fields to data process instance relationship aspects by @asikowitz in #12860
- feat(ui): Update ExternalUrlButton to include self-hosted gitlab URLs by @k7ragav in #12734
- fix(ui) Support glossary nodes in autocomplete by @chriscollins3456 in #12858
- feat(ingest/mlflow): update dpi to use edge for lineage by @yoonhyejin in #12861
- fix(ge-profiler): catch TimeoutError by @sgomezvillamor in #12855
- fix(databricks): fixes profile median by @sgomezvillamor in #12856
- fix(ingest): fix error in deploy command by @hsheth2 in #12820
- docs(ingest): custom transformer remote executor by @anshbansal in #12864
- feat(restore-indices): createDefaultAspects argument by @david-leifker in #12859
- ci(tests):show cypress smoke tests in junit format for better reporting by @chakru-r in #12865
- feat(ingest/salesforce): include formula in in field description by @mayurinehate in #12840
- feat(ingestion-tracing): implement ingestion with tracing api by @david-leifker in #12714
- hotfix(ui): Addressing assertions hotfixes by @jjoyce0510 in #12785
- feat(ingestion) Adding vertexAI ingestion source (v1 - model group and model) by @ryota-cloud in #12632
- feat(ingest/hive): identify partition columns in hive tables by @deepgarg-visa in #12833
- fix(api-tracing): handle corner case for historic by @david-leifker in #12870
- docs(website) update docusaurus config by @maggiehays in #12862
- feat(system-metrics): track api usage by user, client, api by @david-leifker in #12872
- fix(ingest/snowfla...
v1.1.0rc3
v1.1.0rc2
v1.1.0rc1
v1.0.0
DataHub v1.0.0
Release Highlights
DataHub v1.0.0 is packed with exciting updates, including:
- A completely redesigned user experience focused on simplified navigation and a visually stunning interface.
- Unified support for Data & AI, including AI Model Group Versions, AI Model Lineage, Model Stats, and Experiment/Run ingestion.
- DataHub Iceberg Catalog, allowing users to manage Iceberg tables directly from DataHub.
Read the blog post here!
Changelog
New User Interface: Putting Usability First
With a completely re-designed user interface, DataHub v1.0 represents a fundamental rethinking of how users interact with their metadata and data assets. The new experience includes:
- Intuitive Platform-Based Navigation - Hierarchically browse data by database and schema in Snowflake, BigQuery, Redshift, Databricks, and more. Combine hierarchical navigation with filtering by data owners, domain, tags, and glossary terms to find the right data fast.
- Seamless Lineage Exploration - Our reimagined lineage view features multi-level expansion, name-based search, and column-level visibility, making it easier than ever to understand data relationships and impact.
- Integrated Data Quality - Make confident decisions with deeply integrated quality signals throughout the platform, helping you quickly identify and trust reliable data assets.
DataHub Admins can enable the new UI for all users by setting the THEME_V2_DEFAULT
environment variable to true
; until then, Users can opt into the new experience by navigating to Settings > Appearance > Try New User Experience.
Comprehensive AI Asset Support: Unifying Data and AI
DataHub v1.0 treats AI assets as first-class citizens within the data ecosystem, allowing users to track their entire data-to-AI pipeline in one place.
- Unified Search and Discovery: Seamlessly search across models, model groups, and traditional data assets in one unified interface.
- Advanced Versioning System: Track multiple versions of datasets and ML models with detailed performance metrics and clear linkages between versions.
- Rich Model Statistics: Monitor key metrics across versions, understand performance trends, and make data-driven decisions about model deployment.
- End-to-End Lineage: Trace data flows from raw inputs through models to final outputs, with complete versioning support.
DataHub Iceberg REST Catalog Beta: Simplifying Data Lake Management
This release introduces an integration with Apace Iceberg, allowing users to manage Iceberg tables directly through DataHub, including:
- Create and manage Iceberg tables through DataHub
- Maintain consistent metadata across DataHub and Iceberg
- Facilitate data discovery by exposing Iceberg table metadata in DataHub
- Enable secure access to Iceberg tables through DataHub's permissions model
Read the docs here!
DataHub CLI
This release introduces the following improvements to our CLI:
- Added
container
command to apply tags, terms, and owners on all assets within the container. [ #12418, #12436] - Improved
delete
command to optionally reference a file with a list of URNS to be deleted. [#12247] - Expanded
ingest
command to support ingesting MCPs from S3. [#12649]
Metadata Ingestion
We’re continuously improving our integrations to add new capabilities and squash bugs.
- dbt: Added the parameter
include_database_name
to support including the database name in URN generation. [#12411] - Iceberg: Alongside our new Iceberg Catalog API, we’ve made various improvements to our Iceberg integration. [#12744]
- MLFlow: Significantly revamped our MLFlow connector, adding support for tracking Model Group Versions and Model Stats; tracking Model lineage to underlying datasets; and capturing Experiments and Runs.
- MSSQL: Improved support for extracting stored procedures from MS SQL. [ #12244, #12563]
- Oracle: Improved the accuracy of column-level lineage resolution.
- PowerBI: Improved lineage mapping so PowerBI Reports can now contain PowerBI Dashboards. [#12451]
- Redshift: Added support for data shares and external schemas, including automatic lineage resolution across Redshift namespaces.
- S3: Added functionality to the S3 ingestion process to ignore paths that do not match the specified depth, resolving warning messages triggered by mismatched paths. [#12326]
- Snowflake: Added support for Snowflake Streams and Hybrid Tables, and fixed a bug with lineage resolution across table renames. [#12318]
- Superset: (community contribution!): Added support for Superset virtual datasets and lineage. [#12679]
Additionally, we’re working on a new integration with Vertex AI. Please reach out if you’re interested in joining the beta.
Of course, this only scratches the surface of changes. This release contains 100+ improvements across 25 different integrations.
Thank You to our Contributors!
First-Time Contributors
@Bhadhri03 @brock-acryl @cccs-cat001 @davidebriscese @Deepalijain13 @dougbot01 @Haebuk @haon85 @josges @mihai103 @rajatgl17 @Rasnar @rharisi @samanthafigueredo5 @ttekampe
Repeat Contributors
@bda618 @deepgarg-visa @eagle-25 @jayasimhankv @ksrinath @llance @Masterchen09 @mayurinehate @mkamalas @PeteMango @pinakipb2 @remisalmon @sagar-salvi-apptware @svdimchenko @v-tarasevich-blitz-brain
Project Maintainers
@anshbansal @asikowitz @chakru-r @chriscollins3456 @david-leifker @gabe-lyons @hsheth2 @jayacryl @jjoyce0510 @kevinkarchacryl @pedro93 @RyanHolstien @ryota-cloud @sakethvarma397 @sgomezvillamor @shirshanka @skrydal @treff7es @yoonhyejin
View the full changelog: v0.15.0.1...v1.0.0
v1.0.0rc5
Full Changelog: v1.0.0rc4...v1.0.0rc5
v1.0.0rc4
Full Changelog: v1.0.0rc3...v1.0.0rc4
v1.0.0rc3
What's Changed
- fix(filters) Fix autocomplete for platforms and improve advanced search builder by @chriscollins3456 in #12560
- fix(ingest): handle groups in pattern_cleanup_ownership transformer by @cccs-cat001 in #12536
- tests(druid): integration tests for druid ingestion by @sgomezvillamor in #12717
- feat(api): let admins use granted privileges for actors by @anshbansal in #12718
- feat(build): use
pull_request_target
for datahub-wheels by @hsheth2 in #12722 - feat(ui): access management docs by @kevinkarchacryl in #12719
- fix(lineage): error message for edit lineage by @anshbansal in #12724
- docs: clarify limits on AI docs by @hsheth2 in #12728
- fix(urn-validation): additional test cases for urn validation by @david-leifker in #12727
- fix(ui) Fix NPE in pluralize function by @chriscollins3456 in #12629
- Fix platform instance support on Druid ingestion by @Rasnar in #12716
- ci(coverage): update patch coverage threshold by @chakru-r in #12733
- fix(ui) Fix bug with date dropdown in deprecation modal by @chriscollins3456 in #12633
- fix(ui) Fix group membership inconsistencies on group page by @chriscollins3456 in #12704
- fix(ui) Properly get display name when downloading search results by @chriscollins3456 in #12720
- fix(ingest): bump avro dep by @hsheth2 in #12729
- fix(ui) Filter healthy assets out of unhealthy upstreams component by @chriscollins3456 in #12705
- docs: update slack link by @hsheth2 in #12731
- fix(build): support datahub-wheels from forked PRs by @hsheth2 in #12730
- docs: add scarf integration by @hsheth2 in #12739
- fix(iceberg-cli): add missing filter for iceberg dataplatform by @chakru-r in #12732
- dev: immutable args remove by @anshbansal in #12735
- build(deps): bump dompurify from 2.5.4 to 3.2.4 in /datahub-web-react by @dependabot in #12643
- refactor(ui): Migrate to use the new Button component consistently by @jjoyce0510 in #12597
- docs(restore-indices): added best practices by @david-leifker in #12741
- feat(ui/lineageV2): Show version pill in lineage sidebar and node by @asikowitz in #12599
- chore(bump): Bump kafka-setup base by @david-leifker in #12743
- dev: enable ruff rule by @anshbansal in #12742
- revert(ci): revert datahub-wheel build changes by @hsheth2 in #12747
- feat: API key support in Metabase source by @rajatgl17 in #12711
- dev: enable ruff rule by @anshbansal in #12749
- refactor(ingest/s3): enhance readability by @eagle-25 in #12686
- feat(ingestion/superset): superset dataset lineage for metadata ingestion by @PeteMango in #12679
- chore(ci): avoid dep on confluent-kafka 2.8.1 by @hsheth2 in #12753
- feat(graphql): implement sort and facet for scroll by @david-leifker in #12746
- feat(ingest): improve error messages for unknown metadata objects by @hsheth2 in #12745
- fix(web) accurate error message for embeddedlistsearch by @jayacryl in #12622
- feat(ingestion/iceberg): Several improvements to iceberg connector by @skrydal in #12744
- fix(ingest): support pydantic v2 in file-based lineage by @hsheth2 in #12723
- feat(iceberg): improve concurrency control and resilience by @ksrinath in #12664
- docs(users+groups): show that you can set title via users YAML by @gabe-lyons in #12767
- feat(sdk): add search client by @hsheth2 in #12754
- feat(operations): ES and Kafka Operations Endpoints by @david-leifker in #12756
- feat(auth): support guest access by @chakru-r in #12619
- fix(iceberg): listnamespaces includes warehouse name as root by @chakru-r in #12761
- feat(UI): make searchbar centered and wider by @v-tarasevich-blitz-brain in #12666
- fix(ui) Fix order of parent containers on v2 autocomplete item by @chriscollins3456 in #12721
- fix(test): handle empty log by @david-leifker in #12768
- fix(lineage) Support views and sorting in impact analysis by @chriscollins3456 in #12769
- feat(versioning): Support entity versioning ingestion by @asikowitz in #12755
- fix(ui): add overflow wrap for dpi / model summary tab & add custom properties in mlmodelgroup queries by @yoonhyejin in #12771
- feat(sdk): add support for institutional memory links by @hsheth2 in #12770
New Contributors
- @Rasnar made their first contribution in #12716
- @rajatgl17 made their first contribution in #12711
- @PeteMango made their first contribution in #12679
- @v-tarasevich-blitz-brain made their first contribution in #12666
Full Changelog: v1.0.0rc2...v1.0.0rc3
v1.0.0rc2
What's Changed
- docs(ingest/mode): add details on authentication/permissions for mode by @hsheth2 in #12508
- fix(ingest/snowflake): Create all structured propery templates before assignation by @treff7es in #12469
- docs: fix token to be not required in sample script by @yoonhyejin in #12511
- fix(mssql): adds missing containers and browsepathsv2 for dataflow and datajob by @sgomezvillamor in #12483
- fix(ingest/glue): change to warning on access denied by @anshbansal in #12519
- fix(ingest/mode): remove unused field by @anshbansal in #12520
- docs: fix link to executor helm chart by @anshbansal in #12522
- fix(ingest): add missing dep for gcs by @hsheth2 in #12505
- docs(entity-change-events): add docs for action request events by @gabe-lyons in #12493
- docs(ingest): script to add ERModelRelationship Entity by @sagar-salvi-apptware in #12473
- refactor(trace-model): refactor trace model package by @david-leifker in #12510
- fix(ci): run smoke tests on release by @chakru-r in #12518
- chore(bump): bump jmx version by @david-leifker in #12524
- fix(cli): avoid false positive cli upgrade suggestions by @hsheth2 in #12497
- fix(ingest/azure-ad): limit the size of the ingestion report by @hsheth2 in #12498
- feat(metadata-io): enable rollback transaction support by @david-leifker in #12509
- feat(snowflake): add missing pushdown_deny_usernames config to be used when use_queries_v2 by @sgomezvillamor in #12527
- fix(model): fixes DashboardContainsDashboard relationship in DashboardInfo aspect by @sgomezvillamor in #12433
- feat(restoreIndices): update restore indices args and docs by @RyanHolstien in #12529
- fix(businessAttribute): fix business Attribute related entities by @deepgarg-visa in #12537
- fix(ui): make data process instance visible in container in V2& fix model/modelgroup names by @yoonhyejin in #12513
- fix(ingest): avoid multiprocessing "fork" start method by @hsheth2 in #12543
- fix(ui): revert backend breaking changes to mau by @kevinkarchacryl in #12461
- tests(kafka-connect): fixes integration tests setup by @sgomezvillamor in #12531
- fix(ingest/unity): add row count in table profile of delta tables by @mayurinehate in #12480
- fix(ingest): use lossy collections by @anshbansal in #12523
- fix(misc-openapi): fix openlineage, platform events & swagger by @david-leifker in #12539
- fix(test): move reading env variable inside method by @anshbansal in #12549
- feat(versioning): Add V2 UI; make backend more synchronous; add to component library by @asikowitz in #12542
- docs(iceberg): add iceberg user guide by @chakru-r in #12533
- feat(ingestion/snowflake):adds streams as a new dataset with lineage and properties. by @brock-acryl in #12318
- feat(powerbi): Report to Dashboard lineage by @sgomezvillamor in #12451
- fix(no-rows-updated): fix no rows updated by @david-leifker in #12530
- ci(smoke): report smoke test results to codecov by @hsheth2 in #12556
- feat(UI): Confirmation before deleting Link by @pinakipb2 in #12162
- feat(ingestion/s3): ignore depth mismatched path by @eagle-25 in #12326
- feat(docs-site) adding case studies and updating banner by @jayacryl in #12525
- feat(ingestion/mongodb) re-order aggregation logic by @Haebuk in #12428
- docs(salesforce): add missing salesforce source to cli doc by @remisalmon in #12550
- feat(openapi): precondition exceptions return 412 by @david-leifker in #12552
- feat(openapi): point in time parameter (elasticsearch only) by @david-leifker in #12553
- fix(openapi-spec): fix openapi spec oneOf schema by @david-leifker in #12561
- fix(autocomplete): fix autocomplete duplicate field by @david-leifker in #12558
- build(deps): bump black from 23.7.0 to 24.3.0 in /metadata-service/iceberg-catalog by @dependabot in #12502
- feat(sdk): add scaffolding for sdk v2 by @hsheth2 in #12554
- doc(dbt): Add missing dbt extra requirement to cli doc by @remisalmon in #12568
- feat(docs): Add live secret reload docs to k8s remote executor page by @pedro93 in #12541
- fix(ingest): remove duplicate mcps,more typing by @mayurinehate in #12557
- doc: update doc of first release by @anshbansal in #12574
- fix(docs) explain need to restore indices when adding @searchable by @jayacryl in #12576
- fix(sdk): fix platform instance generation in the sdk by @hsheth2 in #12573
- fix(looker): sort user mapping for consistency by @hsheth2 in #12569
- fix(ingestion/teradata): teradata profiling fix for pooling by @brock-acryl in #12507
- fix(structuredProps) Add validation for allowedTypes and harden API for invalid types by @chriscollins3456 in #12578
- fix(ui): better experience for analytics charts by @kevinkarchacryl in #12462
- feat(ingest/mssql): improve stored procedure splitting by @hsheth2 in #12563
- docs: add page on metadata standards by @hsheth2 in #12584
- feat(gh-workflows) adding jayacryl to pr-labeler by @jayacryl in #12579
- fix(iceberg): delete associated platform resources when deleting warehouse by @chakru-r in #12564
- feat(ingest): add display name for dynamodb tables by @mayurinehate in #12534
- fix(ui) Show editable field info for fields based on exact fieldPath version by @chriscollins3456 in #12570
- fix(openapi-schema): fix openapi schema generator by @david-leifker in #12590
- feat(ingestion/dbt): Add include_database_name parameter for dbt core by @svdimchenko in #12411
- fix(web) ingestion page resets when filter updated by @jayacryl in #12589
- dev: update pre-commit config by @anshbansal in #12592
- feat(UI): add user location to user profile page by @samanthafigueredo5 in #12016
- fix(graphql): Skip schema fields with empty fieldPaths to prevent the dataset mapper from erroring out by @jayasimhankv in #12562
- feat(graphql,ui): Update ML system V2 UI by @asikowitz in #12598
- fix(url-encoding): fix regression in url encoding by @david-leifker in #12601
- fix(ingest/snowflake): order queries for queries_v2 by @hsheth2 in #12551
- feat(ci): add pytest hooks for updating golden files by @hsheth2 in #12581
- fix(ingest): pick topics from config for sink connector by @mayurinehate in #12535
- doc: add note for subscription by @anshbansal in #12607
- feat(okta): adds ingest_groups_users config parameter by @sgomezvillamor in #12371
- feat(urn-validation): Add UrnValidation PDL annotation by @david-leifker in #12572
- feat(search): include timestamp for entity metadata change by @deepgarg-visa in https://github.com/d...
v1.0.0rc1
fix(ci): disable ci telemetry modelDocUpload (#12504)