A comprehensive knowledge management system with AI integration that processes org-mode files, creates vector embeddings for semantic search, and provides a command-line interface for managing your personal knowledge base.
This system follows a local-first architecture that:
- Processes collections of org-mode files containing personal knowledge (notes, journals, creative writing, etc.)
- Creates vector embeddings for semantic search using ChromaDB
- Supports multiple AI agents that can query and synthesize knowledge across domains
- Maintains modular, testable architecture for future AI agent integration
- Org-mode File Processing: Parse org files with support for headers, properties, filetags, and content extraction
- Semantic Search: Vector-based search using ChromaDB for finding relevant content across your knowledge base
- Content Chunking: Intelligent splitting of content into chunks with configurable size and overlap
- Metadata Extraction: Automatic extraction of titles, tags, IDs, and other metadata from org files
- Command-Line Interface: Full CLI for indexing, searching, and managing your knowledge base
The system supports a hierarchical directory structure with a 3-axis tagging system:
- [domain] - Subject area (e.g., mathematics, cooking, philosophy)
- [form] - Content type (e.g., reference, journal, creative)
- [granularity] - Detail level (e.g., overview, detailed, specific-topic)
Example: #mathematics #reference #set-theory
- Python 3.8+
- ChromaDB (install via pip)
- Clone the repository
- Install ChromaDB:
pip install --user chromadb
- Verify installation:
python cli.py config
knowledge_system/
├── src/ # Core modules
│ ├── config.py # Configuration management
│ ├── scanner.py # File discovery
│ ├── parser.py # Org-mode parsing
│ ├── chunking.py # Content chunking
│ └── chroma_manager.py # Vector database operations
├── tests/ # Test suite
│ ├── fixtures/ # Test data
│ └── test_*.py # Test files
├── config/
│ └── default.json # Default configuration
├── knowledge_base/ # Your org files go here
├── cli.py # Command-line interface
└── pytest.ini # Test configuration
The system uses config/default.json
for configuration:
{
"knowledge_base_root": "./knowledge_base",
"chroma_db_path": "./local_cache/chromadb",
"chunk_size": 1000,
"chunk_overlap": 200
}
- knowledge_base_root: Directory containing your org files
- chroma_db_path: Where ChromaDB stores vector embeddings
- chunk_size: Maximum characters per content chunk
- chunk_overlap: Characters of overlap between chunks
The CLI provides four main commands:
python cli.py config
Shows current configuration settings and path status.
python cli.py status
Displays:
- Number of org files found
- ChromaDB collections and document counts
- System health information
python cli.py index
Processes all org files in your knowledge base:
- Scans for
.org
files - Extracts headers and content
- Chunks content appropriately
- Stores in ChromaDB with metadata
Example output:
Loading config from config/default.json
Found 4 org files to process
Creating/clearing collection: knowledge_base
Processing 1/4: machine-learning.org
Stored 3 chunks from machine-learning.org
Processing 2/4: cooking-recipes.org
Stored 4 chunks from cooking-recipes.org
...
Indexing complete! Stored 15 total chunks in collection 'knowledge_base'
python cli.py search "your query here"
Search options:
--results N
: Number of results to return (default: 5)--collection NAME
: Search specific collection (default: knowledge_base)
Example:
python cli.py search "machine learning algorithms" --results 3
All commands support:
--config PATH
: Use custom configuration file--db-path PATH
: Override ChromaDB path (useful for testing)
Example:
python cli.py index --config my-config.json --db-path /tmp/test-db
Your org files should follow this structure:
:PROPERTIES:
:ID: unique-uuid-here
:END:
#+TITLE: Your Note Title
#+filetags: :domain:form:granularity:
* Main Heading
Your content goes here. The system will extract:
- The title from #+TITLE
- Tags from #+filetags
- The unique ID from PROPERTIES
- All content (excluding headers and properties)
** Subheading
More content with [[file:other-note.org][links to other notes]].
- Setup: Place your org files in the
knowledge_base/
directory - Index: Run
python cli.py index
to process all files - Search: Use
python cli.py search "topic"
to find relevant content - Monitor: Check
python cli.py status
to see system state
pytest tests/
The test suite includes:
- Unit tests for all core components
- Integration tests for the full pipeline
- CLI command testing with temporary databases
- Automated cleanup and isolation
The system follows a modular architecture:
- Config: JSON-based configuration management
- Scanner: Discovers org files in directory trees
- Parser: Extracts headers, metadata, and content from org files
- ChunkingEngine: Splits content into overlapping chunks
- ChromaManager: Handles vector database operations
- CLI: Command-line interface tying everything together
No results when searching:
- Ensure you've run
python cli.py index
first - Check that your org files are in the configured knowledge_base_root
- Verify ChromaDB path is accessible
Import errors:
- Make sure ChromaDB is installed:
pip install --user chromadb
- Check that all source files are present in the
src/
directory
Configuration problems:
- Run
python cli.py config
to verify settings - Ensure
config/default.json
exists and is valid JSON
- Check system status:
python cli.py status
- Verify configuration:
python cli.py config
- Run tests to ensure system integrity:
pytest tests/
This system is designed for extensibility:
- AI agent integration for automated knowledge synthesis
- Multi-device synchronization
- Advanced search filters and ranking
- Web interface
- Integration with other knowledge management tools
The modular architecture and comprehensive test suite support rapid iteration and feature development.