Katachi is a Python package for validating, processing, and parsing directory structures against defined schemas.
Note: Katachi is currently under active development and should be considered a work in progress. APIs may change in future releases.
- GitHub repository: https://github.com/nmicovic/katachi/
- Documentation: https://nmicovic.github.io/katachi/
- 📐 Schema-based validation - Define expected directory structures using YAML
- 🧩 Extensible architecture - Create custom validators and actions
- 🔄 Relationship validation - Validate relationships between files (like paired files)
- 🚀 Command-line interface - Easy to use CLI with rich formatting
- 📋 Detailed reports - Get comprehensive validation reports
Install from PyPI:
pip install katachi
For development:
git clone https://github.com/nmicovic/katachi.git
cd katachi
make install
semantical_name: data
type: directory
pattern_name: data
children:
- semantical_name: image
pattern_name: "img\\d+"
type: file
extension: .jpg
description: "Image files with numeric identifiers"
- semantical_name: metadata
pattern_name: "img\\d+"
type: file
extension: .json
description: "Metadata for image files"
- semantical_name: file_pairs_check
type: predicate
predicate_type: pair_comparison
description: "Check if images have matching metadata files"
elements:
- image
- metadata
katachi validate schema.yaml target_directory
Validate a simple directory structure:
katachi validate "tests/schema_tests/test_sanity/schema.yaml" "tests/schema_tests/test_sanity/dataset"
Validate a nested directory structure:
katachi validate "tests/schema_tests/test_depth_1/schema.yaml" "tests/schema_tests/test_depth_1/dataset"
Validate paired files (e.g., ensure each .jpg has a matching .json file):
katachi validate "tests/schema_tests/test_paired_files/schema.yaml" "tests/schema_tests/test_paired_files/data"
Validate Azure Blob Storage:
# Set Azure credentials in environment variables
export AZURE_STORAGE_ACCOUNT="your_storage_account"
export AZURE_STORAGE_ACCESS_KEY="your_access_key"
# Or use SAS token
export AZURE_STORAGE_SAS_TOKEN="your_sas_token"
# Validate local schema against Azure Blob Storage
katachi validate "schema.yaml" "abfs://container/path"
# Validate schema in Azure Blob Storage against another Azure Blob Storage path
katachi validate "abfs://container/schema.yaml" "abfs://container/path"
from pathlib import Path
from katachi.schema.importer import load_yaml
from katachi.schema.validate import validate_schema
# Load schema from YAML
schema = load_yaml(Path("schema.yaml"), Path("data_directory"))
# Validate directory against schema
report = validate_schema(schema, Path("data_directory"))
# Check if validation passed
if report.is_valid():
print("Validation successful!")
else:
print("Validation failed with the following issues:")
for result in report.results:
if not result.is_valid:
print(f"- {result.path}: {result.message}")
import os
from katachi.schema.importer import load_yaml
from katachi.schema.validate import validate_schema
from katachi.utils.fs_utils import get_filesystem
# Set Azure credentials
os.environ["AZURE_STORAGE_ACCOUNT"] = "your_storage_account"
os.environ["AZURE_STORAGE_ACCESS_KEY"] = "your_access_key"
# Or use SAS token
# os.environ["AZURE_STORAGE_SAS_TOKEN"] = "your_sas_token"
# Get filesystem for Azure Blob Storage
target_fs = get_filesystem("abfs://container/path")
schema_fs = get_filesystem("abfs://container/schema.yaml")
# Load schema from Azure Blob Storage
schema = load_yaml("schema.yaml", "path", schema_fs, target_fs)
# Validate Azure Blob Storage path against schema
report = validate_schema(schema, "path", target_fs)
# Check validation results
if report.is_valid():
print("Validation successful!")
else:
print("Validation failed with the following issues:")
for result in report.results:
if not result.is_valid:
print(f"- {result.path}: {result.message}")
from pathlib import Path
from katachi.schema.schema_node import SchemaNode
from katachi.validation.core import ValidationResult, ValidatorRegistry
def my_custom_validator(node: SchemaNode, path: Path) -> ValidationResult:
# Custom validation logic
return ValidationResult(
is_valid=True,
message="Custom validation passed",
path=path,
validator_name="custom_validator"
)
# Register the validator
ValidatorRegistry.register("custom_validator", my_custom_validator)
from pathlib import Path
from typing import Any
from katachi.schema.actions import register_action, NodeContext
def process_image(node, path: Path, parent_contexts: list[NodeContext], context: dict[str, Any]) -> None:
# Custom image processing logic
print(f"Processing image: {path}")
# Access parent context if needed
for parent_node, parent_path in parent_contexts:
if parent_node.semantical_name == "timestamp":
print(f"Image from date: {parent_path.name}")
break
# Register the action
register_action("image", process_image)
Contributions are welcome! See CONTRIBUTING.md for details.
This project is licensed under the terms of the MIT License.