Skip to content

datahai/GenMLV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

43 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

GenMLV

Project to manage MLVs in Microsoft Fabric Lakehouse

  • Keeps your Fabric Lakehouse MVLs in sync with .sql files in a Lakehouse Files directory.
  • Automatically creates or updates Materialized Lake Views based on the .sql files.
  • Cleans up any MLVs that no longer have a corresponding .sql file.
  • Maintains a metadata file to track changes and avoid unnecessary recreation.

Usage Video

There's a video here showing the usage of the GenMLV process: YouTube


๐Ÿ”ง 1. Setup and Initialization

  • sql_root_path: Root directory containing subfolders with .sql files.
  • metadata_file_path: Path to a JSON file that stores metadata about the .sql files (e.g., last modified time).

๐Ÿ“ 2. Load or Initialize Metadata

If the metadata file doesn't exist:

  • Create the directory (if needed).
  • Create an empty JSON file to store metadata.

If it exists:

  • Try to load the JSON content.

๐Ÿ“„ 3. Discover .sql Files

  • Traverse all subdirectories under sql_root_path.
  • For each .sql file found:
    • Record its full path.
    • Get its last modified timestamp.
    • Store this info in a dictionary sql_files keyed by the table name (filename without .sql).

๐Ÿ“‹ 4. Get Existing Materialized Lake Views (MLVs)

  • Use Spark SQL to list all existing MLVs.
  • Store their names in a set called existing_mlvs.

๐Ÿงน 5. Drop Obsolete MLVs

  • Identify MLVs that exist in Spark but no longer have a corresponding .sql file.
  • Drop each of these MLVs using DROP MATERIALIZED LAKE VIEW.
  • Remove their entries from the metadata.

๐Ÿ”„ 6. Create or Recreate MLVs

  • For each .sql file:
    • Compare its last modified time with the timestamp in the metadata.
    • If the file is newer:
      • Read the SQL SELECT statement from the file.
      • Drop the existing MLV (if it exists).
      • Create a new MLV using CREATE MATERIALIZED LAKE VIEW.
      • Schemas will be created automatically based on file names
      • Update the metadata with the new timestamp.

๐Ÿ’พ 7. Save Updated Metadata

  • Write the updated metadata dictionary back to the JSON file.

About

Project to manage MLVs in Microsoft Fabric Lakehouse

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages