OrchestrAIte processes .wav
audio input, extracts features, and identifies multiple instruments using a convolutional neural network (CNN).
- Accepts
.wav
audio files as input - Uses log-mel spectrograms for feature extraction
- Multi-label CNN for instrument identification
- Web interface built with FastAPI and Streamlit
- Deployable via Docker and Google Cloud Run
Fork this repository and clone it in your virtual environment.
pip install -r requirements.txt
- Open a terminal and start the FastAPI server using Uvicorn:
uvicorn api.fast_api:app --reload
- In a separate terminal, start the Streamlit application:
streamlit run interface/app.py
-
Ensure Docker is running, then build and start:
docker compose up --build
-
Open the Streamlit application in your browser ➡
http://localhost:8501
-
To stop and remove containers:
docker compose down
- Set up a Google Cloud Project and enable Cloud Run.
- Authenticate with Google Cloud.
- Build and push the Docker image to Artifact Registry.
- Deploy to Cloud Run.
- Update
API_URL
ininterface/app.py
with the deployed URL. - Test the deployment.
- 32-bit PCM
- Mono
- 44.1 kHz sample rate
User interface screenshots (click to enlarge):
![]() |
![]() |
![]() |
![]() |
The training data comes from the MusicNet dataset on Kaggle, which is pre-split into training and test folders. Although MusicNet contains labels for 11 instruments in the training set, only 7 instruments are labeled in the test set. As a result, the model was trained to identify the following instruments:
- Piano
- Violin
- Viola
- Cello
- Bassoon
- Clarinet
- Horn
The model was evaluated on the test set with the following results:
Test Loss: 0.08
Test Accuracy: 76.8%
Precision: 95.8%
Recall: 95.6%
While the model performs well on the test set, real-world performance may vary depending on the quality and complexity of the input audio.
OrchestrAIte was developed by a four-person team as part of a project at Le Wagon Tokyo. The project was completed in two weeks and demoed on December 6, 2024.