katachi

Katachi is a Python package for validating, processing, and parsing directory structures against defined schemas.

Note: Katachi is currently under active development and should be considered a work in progress. APIs may change in future releases.

GitHub repository: https://github.com/nmicovic/katachi/
Documentation: https://nmicovic.github.io/katachi/

Features

📐 Schema-based validation - Define expected directory structures using YAML
🧩 Extensible architecture - Create custom validators and actions
🔄 Relationship validation - Validate relationships between files (like paired files)
🚀 Command-line interface - Easy to use CLI with rich formatting
📋 Detailed reports - Get comprehensive validation reports

Installation

Install from PyPI:

pip install katachi

For development:

git clone https://github.com/nmicovic/katachi.git
cd katachi
make install

Quick Start

Define a schema (schema.yaml)

semantical_name: data
type: directory
pattern_name: data
children:
  - semantical_name: image
    pattern_name: "img\\d+"
    type: file
    extension: .jpg
    description: "Image files with numeric identifiers"
  - semantical_name: metadata
    pattern_name: "img\\d+"
    type: file
    extension: .json
    description: "Metadata for image files"
  - semantical_name: file_pairs_check
    type: predicate
    predicate_type: pair_comparison
    description: "Check if images have matching metadata files"
    elements:
      - image
      - metadata

Validate a directory structure

katachi validate schema.yaml target_directory

Command-Line Examples

Validate a simple directory structure:

katachi validate "tests/schema_tests/test_sanity/schema.yaml" "tests/schema_tests/test_sanity/dataset"

Validate a nested directory structure:

katachi validate "tests/schema_tests/test_depth_1/schema.yaml" "tests/schema_tests/test_depth_1/dataset"

Validate paired files (e.g., ensure each .jpg has a matching .json file):

katachi validate "tests/schema_tests/test_paired_files/schema.yaml" "tests/schema_tests/test_paired_files/data"

Validate Azure Blob Storage:

# Set Azure credentials in environment variables
export AZURE_STORAGE_ACCOUNT="your_storage_account"
export AZURE_STORAGE_ACCESS_KEY="your_access_key"
# Or use SAS token
export AZURE_STORAGE_SAS_TOKEN="your_sas_token"

# Validate local schema against Azure Blob Storage
katachi validate "schema.yaml" "abfs://container/path"

# Validate schema in Azure Blob Storage against another Azure Blob Storage path
katachi validate "abfs://container/schema.yaml" "abfs://container/path"

Python API

from pathlib import Path
from katachi.schema.importer import load_yaml
from katachi.schema.validate import validate_schema

# Load schema from YAML
schema = load_yaml(Path("schema.yaml"), Path("data_directory"))

# Validate directory against schema
report = validate_schema(schema, Path("data_directory"))

# Check if validation passed
if report.is_valid():
    print("Validation successful!")
else:
    print("Validation failed with the following issues:")
    for result in report.results:
        if not result.is_valid:
            print(f"- {result.path}: {result.message}")

Using Azure Blob Storage

import os
from katachi.schema.importer import load_yaml
from katachi.schema.validate import validate_schema
from katachi.utils.fs_utils import get_filesystem

# Set Azure credentials
os.environ["AZURE_STORAGE_ACCOUNT"] = "your_storage_account"
os.environ["AZURE_STORAGE_ACCESS_KEY"] = "your_access_key"
# Or use SAS token
# os.environ["AZURE_STORAGE_SAS_TOKEN"] = "your_sas_token"

# Get filesystem for Azure Blob Storage
target_fs = get_filesystem("abfs://container/path")
schema_fs = get_filesystem("abfs://container/schema.yaml")

# Load schema from Azure Blob Storage
schema = load_yaml("schema.yaml", "path", schema_fs, target_fs)

# Validate Azure Blob Storage path against schema
report = validate_schema(schema, "path", target_fs)

# Check validation results
if report.is_valid():
    print("Validation successful!")
else:
    print("Validation failed with the following issues:")
    for result in report.results:
        if not result.is_valid:
            print(f"- {result.path}: {result.message}")

Extending Katachi

Custom validators

from pathlib import Path
from katachi.schema.schema_node import SchemaNode
from katachi.validation.core import ValidationResult, ValidatorRegistry

def my_custom_validator(node: SchemaNode, path: Path) -> ValidationResult:
    # Custom validation logic
    return ValidationResult(
        is_valid=True,
        message="Custom validation passed",
        path=path,
        validator_name="custom_validator"
    )

# Register the validator
ValidatorRegistry.register("custom_validator", my_custom_validator)

Custom file processing

from pathlib import Path
from typing import Any
from katachi.schema.actions import register_action, NodeContext

def process_image(node, path: Path, parent_contexts: list[NodeContext], context: dict[str, Any]) -> None:
    # Custom image processing logic
    print(f"Processing image: {path}")
    # Access parent context if needed
    for parent_node, parent_path in parent_contexts:
        if parent_node.semantical_name == "timestamp":
            print(f"Image from date: {parent_path.name}")
            break

# Register the action
register_action("image", process_image)

Contributing

Contributions are welcome! See CONTRIBUTING.md for details.

License

This project is licensed under the terms of the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.devcontainer		.devcontainer
.github		.github
docs		docs
src/katachi		src/katachi
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
codecov.yaml		codecov.yaml
logo.png		logo.png
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
tox.ini		tox.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

katachi

Features

Installation

Quick Start

Define a schema (schema.yaml)

Validate a directory structure

Command-Line Examples

Python API

Using Azure Blob Storage

Extending Katachi

Custom validators

Custom file processing

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

nmicovic/katachi

Folders and files

Latest commit

History

Repository files navigation

katachi

Features

Installation

Quick Start

Define a schema (schema.yaml)

Validate a directory structure

Command-Line Examples

Python API

Using Azure Blob Storage

Extending Katachi

Custom validators

Custom file processing

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages