A Python application that transcribes voice recordings to text and processes them into structured notes using AI.
- Transcribe audio files to text using local Whisper model
- Process transcripts using Ollama (local LLM) to generate:
- Key decisions and commitments
- Current blockers and dependencies
- Potential opportunities and ideas
- Identified risks and concerns
- Actionable items with priorities and due dates
- Generate consolidated daily to-do lists from multiple transcripts
- Save processed transcripts in structured JSON format
The application currently consists of separate scripts for each step of the process. Future versions will integrate these into a more cohesive workflow.
- Python 3.8 or higher
- Ollama installed and running locally
- CUDA-capable GPU recommended for faster transcription
- Clone the repository:
git clone https://github.com/yourusername/voice-to-notes.git
cd voice-to-notes
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -e .
- Install and start Ollama:
- Follow instructions at https://ollama.ai/
- Pull the Gemma model:
ollama pull gemma3:27b
-
Place your audio files in the
recordings
directory. -
Transcribe the audio files:
python scripts/transcribe_recording.py
- Process the transcripts:
python scripts/process_transcript.py
- Generate daily to-do lists:
python scripts/generate_daily_todos.py
The processed transcripts will be saved in the processed_transcripts
directory as JSON files, and daily to-do lists will be saved in the daily_todos
directory.
voice-to-notes/
├── recordings/ # Directory for audio files
├── transcripts/ # Directory for raw transcript text files
├── processed_transcripts/ # Directory for processed transcript JSON files
├── daily_todos/ # Directory for consolidated daily to-do lists
├── scripts/
│ ├── transcribe_recording.py # Script for audio transcription
│ ├── process_transcript.py # Script for transcript processing
│ └── generate_daily_todos.py # Script for generating daily to-do lists
├── src/
│ └── voice_to_notes/
│ ├── __init__.py
│ ├── transcribe.py # Core transcription functionality
│ └── models.py # Pydantic models for structured output
├── pyproject.toml
└── README.md
Processed transcripts are saved as JSON files with the following structure:
{
"summary": {
"key_decisions": ["Decision 1", "Decision 2"],
"blockers": ["Blocker 1", "Blocker 2"],
"opportunities": ["Opportunity 1", "Opportunity 2"],
"risks": ["Risk 1", "Risk 2"]
},
"action_items": [
{
"description": "Action item description",
"priority": 3,
"due_date": "2024-03-20T10:00:00",
"blockers": ["Dependency 1"],
"status": "pending"
}
],
"metadata": {
"processed_at": "2024-03-19T15:30:00",
"model_used": "ollama",
"version": "1.0"
}
}
Daily to-do lists are saved as JSON files with the following structure:
{
"date": "2024-03-20",
"action_items": [
{
"description": "Action item description",
"priority": 3,
"due_date": "2024-03-20T10:00:00",
"blockers": ["Dependency 1"],
"status": "pending"
}
],
"key_decisions": ["Decision 1", "Decision 2"],
"blockers": ["Blocker 1", "Blocker 2"],
"opportunities": ["Opportunity 1", "Opportunity 2"],
"risks": ["Risk 1", "Risk 2"]
}
MIT License
Transcribes audio recordings into text files.
Processes transcript text files to extract actionable insights using Ollama.
Aggregates processed transcripts by date and generates consolidated daily to-do lists using Ollama.