Skip to content

Audio processing tool focused on metadata and context generation via off-the-shelf components. Leverages several local ML/AI tools to accomplish transcription, context clues, and audio processing tasks. Designed with extensibility in mind.

License

Notifications You must be signed in to change notification settings

akspa0/The-Machine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The-Machine 🧠🔊

Dedicated to the memory of Carlito Cross Madhouse Live

A context-driven, privacy-first, modular pipeline for understanding, transforming, and building on top of audio recordings.


🚀 What is The-Machine?

The-Machine is a powerful, extensible toolkit for:

  • Adding rich context to audio recordings (calls, music, podcasts, etc.)
  • Preparing audio and metadata for dataset use, research, and creative projects
  • Building new tools and workflows on top of audio context and transcriptions
  • Enabling privacy-first, traceable, and reproducible audio processing

Why?

Audio is more than just sound — it is context, story, and data. The-Machine helps you unlock, organize, and use that context for anything from dataset curation to creative AI workflows.


✨ Features

  • 🎙️ Audio Ingestion & PII Removal: Ingests audio, removes PII from filenames, and anonymizes all logs/outputs.
  • 🗂️ Context-Driven Processing: Every file is tracked, indexed, and processed with full lineage and manifesting.
  • 🧩 Extension System: Modular, plug-and-play extensions for everything—transcription, CLAP annotation, LLM tasks, remixing, show creation, and more.
  • 🦾 LLM Integration: Local LLM support (LM Studio, etc.) for titles, summaries, image prompts, and more—fully privacy-safe.
  • 🗣️ Speaker Diarization & Transcription: Segments audio by speaker, transcribes with Parakeet/Whisper, and aligns with context.
  • 🥁 CLAP Annotation & Segmentation: Detects events (e.g., ringing, hang-up) and segments calls using CLAP.
  • 🎚️ Normalization & Remixing: Loudness normalization, true peak, and creative remixing for dataset or show use.
  • 🖼️ Image/Video Generation: Extensions for SDXL/ComfyUI image and video generation from transcripts and personas.
  • 📜 Manifest & Traceability: Every output is tracked in a manifest—no lost context, ever.
  • 🔒 Privacy-First: No PII in logs, outputs, or manifests. All processing is anonymized by design.
  • 🧠 Memory Bank: Project context, progress, and system patterns are tracked for robust, extension-driven development.
  • 🛠️ Workflow-Driven: All logic and configuration is defined in JSON workflows—easy to extend, modify, and share.
  • 🏗️ Ready for Dataset Prep: Designed to help you build, clean, and annotate audio datasets for ML/AI.
  • 🔄 Resume & Robustness: Pipeline can resume from any stage, with full error recovery and validation.
  • 🧬 Designed for Extensibility: Build your own extensions to add new context, analysis, or creative outputs.
  • Persona builder audio samples are now lossless, using numpy+soundfile to concatenate original .wav files (not _16k.wav), with no resampling or pydub, guaranteeing high fidelity for all persona samples.
  • System prompt for persona generation now instructs the LLM to be concise, allow for absurdity, and keep responses below 300 tokens.
  • All LLM chunking/continuation logic is removed; only direct responses are used for persona and all LLM tasks.
  • Logging and debug output is robust and clear for all pipeline and extension stages.

🧩 Extension System

All new features are implemented as modular extensions in the extensions/ folder. Extensions can:

  • Run after the main pipeline or independently
  • Use all context, transcripts, and outputs
  • Add new analysis, creative outputs, or integrations

See extensions/README.md for a full catalog and authoring guide.


🛠️ Example Usage

Ingest and Process Audio

python pipeline_orchestrator.py input_audio/

Run an Extension (e.g., Persona Builder)

python extensions/character_persona_builder.py outputs/run-YYYYMMDD-HHMMSS --llm-config workflows/llm_tasks.json

Generate Avatars/Images

python extensions/avatar/sdxl_avatar_generator.py \
  --persona-manifest outputs/run-YYYYMMDD-HHMMSS/characters/persona_manifest.json \
  --output-root outputs/run-YYYYMMSS

Use the LLM Utilities (chunking, summarization, etc.)

python extensions/llm_utils.py --help

📚 Project Structure

  • extensions/ — All modular extensions (see README inside)
  • workflows/ — JSON configs for pipeline, CLAP, LLM, etc.
  • memory-bank/ — Project context, progress, and system patterns
  • outputs/ — All run outputs (timestamped folders)
  • specification/ — System and node documentation

🧠 How to Build Your Own Extensions

  1. Copy extension_base.py and inherit from ExtensionBase.
  2. Use context, transcripts, and outputs from any run folder.
  3. Add your logic—analysis, creative output, new integrations, etc.
  4. Log only anonymized, PII-free information.
  5. Document your extension and add it to the catalog!

See extensions/README.md for more.


🌟 Vision & Future

  • Context Everywhere: Audio is just the start — The-Machine is designed to add, use, and build on context for any data.
  • Multimodal Workflows: Future extensions will support image→text→audio pipelines, creative AI, and dataset generation in all directions.
  • Reverse Pipelines: Imagine describing an image with a local LLM, then generating audio or music from that description—The-Machine will make it possible.
  • Open, Extensible, and Privacy-First: Built for researchers, creators, and anyone who wants to understand and use audio context.

📝 Documentation & Resources


🤝 Contributing

  • Contributions, new extensions, and feedback are welcome!
  • Please see the extension authoring guide and open an issue or PR.

Built for context, privacy, and creativity.

About

Audio processing tool focused on metadata and context generation via off-the-shelf components. Leverages several local ML/AI tools to accomplish transcription, context clues, and audio processing tasks. Designed with extensibility in mind.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages