Wave Splitter

A web application that allows users to visualize and split audio files based on speaker segments. It provides an interactive waveform visualization and enables downloading individual speaker segments.

Features

🎵 Interactive waveform visualization
👥 Speaker-based audio segmentation
🎨 Unique color coding for each speaker
⬇️ Download individual speaker segments
🌓 Dark/Light theme support
⏯️ Click-to-play segments
📱 Responsive design

Technology Stack

Frontend

Next.js 14 with App Router
TypeScript
Tailwind CSS
WaveSurfer.js
Material Design Color System

Backend

FastAPI
Python 3.8+
pydub for audio processing
Pydantic for data validation

Getting Started

You can run the application either using Docker or by setting up the development environment locally.

Option 1: Docker Setup (Recommended)

Clone the repository:

git clone https://github.com/yourusername/audio-splitter.git
cd audio-splitter

Make sure Docker and Docker Compose are installed on your system
Start the application:

docker-compose up

The application will be available at:

Frontend: http://localhost:3000
Backend API: http://localhost:8000

Option 2: Local Development Setup

Prerequisites

Frontend:

Node.js 18.17 or later
npm or yarn

Backend:

Python 3.8 or later
pip
FFmpeg (for audio processing)

Frontend Setup

Navigate to the frontend directory:

cd frontend

Install frontend dependencies:

npm install

Create a .env.local file:

NEXT_PUBLIC_API_URL=http://localhost:8000

Run the development server:

npm run dev

Backend Setup

Navigate to the backend directory:

cd backend

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install fastapi uvicorn pydub requests python-multipart

Run the backend server:

uvicorn main:app --reload

Project Structure

audio-splitter/
├── frontend/
│   ├── src/
│   │   ├── app/
│   │   │   └── page.tsx
│   │   ├── components/
│   │   │   ├── AudioSplitter.tsx
│   │   │   └── WaveformPlayer.tsx
│   │   └── contexts/
│   │       └── ThemeContext.tsx
│   └── public/
│       └── screenshot.png
│
├── backend/
│   ├── main.py
│   ├── audio_processor.py
│   └── models.py
│
└── README.md

Backend Components

Models (`models.py`)

from pydantic import BaseModel
from typing import List

class Segment(BaseModel):
    start: float
    end: float
    speaker: str
    text: str

class TranscriptionRequest(BaseModel):
    audio_url: str
    segments: List[Segment]

Audio Processor (`audio_processor.py`)

Handles audio file processing:

Downloading audio from URL
Splitting audio based on speaker segments
Combining segments per speaker
Converting to MP3 format

FastAPI Server (`main.py`)

Provides the REST API endpoints:

POST /split-audio/{speaker} - Splits audio by speaker

API Endpoints

Split Audio by Speaker

POST /split-audio/{speaker}
Content-Type: application/json

{
  "audio_url": "https://example.com/audio.mp3",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "speaker": "A",
      "text": "Hello, how are you?"
    }
  ]
}

Response: MP3 file containing the speaker's segments

Usage

Enter an audio URL in the input field
Paste the JSON transcription data with speaker segments
The waveform will display with color-coded regions for each speaker
Click on segments to play specific portions
Use the download button to get individual speaker audio files

JSON Format

The transcription data should follow this format:

[
  {
    "start": 0.0,
    "end": 2.5,
    "speaker": "A",
    "text": "Hello, how are you?"
  },
  {
    "start": 2.5,
    "end": 5.0,
    "speaker": "B",
    "text": "I'm doing well, thank you!"
  }
]

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Troubleshooting

Common Issues

Audio processing fails
- Ensure FFmpeg is installed and accessible in your PATH
- Verify the audio URL is accessible
- Check the audio format is supported
CORS errors
- Verify the frontend URL is listed in the backend's CORS configuration
- Check that credentials are properly handled
JSON parsing errors
- Ensure the transcription JSON matches the expected format
- Validate the timestamps are within the audio duration
Docker-related issues
- Ensure both Docker and Docker Compose are installed and up to date
- Check if ports 3000 and 8000 are available on your system
- If volumes aren't updating, try rebuilding the containers:
```
docker-compose down
docker-compose up --build
```
- For Windows users, ensure Docker Desktop is running with WSL 2 backend

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
screenshot.jpeg		screenshot.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wave Splitter

Features

Technology Stack

Frontend

Backend

Getting Started

Option 1: Docker Setup (Recommended)

Option 2: Local Development Setup

Prerequisites

Frontend Setup

Backend Setup

Project Structure

Backend Components

Models (`models.py`)

Audio Processor (`audio_processor.py`)

FastAPI Server (`main.py`)

API Endpoints

Split Audio by Speaker

Usage

JSON Format

Contributing

Troubleshooting

Common Issues

License

About

Uh oh!

Releases

Packages

Languages

ahk-d/wavesplit

Folders and files

Latest commit

History

Repository files navigation

Wave Splitter

Features

Technology Stack

Frontend

Backend

Getting Started

Option 1: Docker Setup (Recommended)

Option 2: Local Development Setup

Prerequisites

Frontend Setup

Backend Setup

Project Structure

Backend Components

Models (models.py)

Audio Processor (audio_processor.py)

FastAPI Server (main.py)

API Endpoints

Split Audio by Speaker

Usage

JSON Format

Contributing

Troubleshooting

Common Issues

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Models (`models.py`)

Audio Processor (`audio_processor.py`)

FastAPI Server (`main.py`)

Packages