Spoken language identification DNN implemented in mxnet

This is an mxnet solution for the kaggle dataset "Spoken Language Identification". It uses a simple FFDNN to determine the language spoken in audio data containing human speech.

The included program & model support classification of German, English, and Spanish speech.

Usage

Training and inference can be performed using the included python program.

usage: splidnn.py [-h] {train,infer} ...

Spoken language identification DNN

positional arguments:
  {train,infer}  sub-command help
    train        Train the DNN model
    infer        Infer using trained model

optional arguments:
  -h, --help     show this help message and exit

Training

Training works using the dataset from https://www.kaggle.com/toponowicz/spoken-language-identification -- simply download the archive containing the training/test data and pass it over as an argument. Unpacking the zip archive is not necessary.

The training procedure alternates between training and validation passes, saving a copy of the network (weights) and the optimizer state at the end of every epoch.

usage: splidnn.py train [-h] -e EPOCHS archive

positional arguments:
  archive               Dataset archive

optional arguments:
  -h, --help            show this help message and exit
  -e EPOCHS, --epochs EPOCHS

Inference

Specify the model state and any audio file(s) to analyze. The inference procedure will split up the audio input into 10-second chunks and output the predicted language label per block. FLAC and WAV input formats are supported, audio IO is performed using PySoundFile

usage: splidnn.py infer [-h] -m MODEL audio [audio ...]

positional arguments:
  audio                 Audio file(s) to process

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL

The following example infers language labels for a single example audio file:

$ python3 splidnn.py infer --model model_epoch5 /tmp/de_example_file.flac 
/tmp/de_example_file.flac DE,DE,DE,DE

Results

The results of the sample training run are included in output.txt as well as the best model/trainer pair model_epoch5, trainer_epoch5. We stop training after 5 epochs to avoid overfitting to the input dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
model_epoch5		model_epoch5
output.txt		output.txt
requirements.txt		requirements.txt
splidnn.py		splidnn.py
trainer_epoch5		trainer_epoch5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spoken language identification DNN implemented in mxnet

Usage

Training

Inference

Results

About

Uh oh!

Releases

Packages

Languages

License

fmqa/mxnet-splidnn

Folders and files

Latest commit

History

Repository files navigation

Spoken language identification DNN implemented in mxnet

Usage

Training

Inference

Results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages