Identity Recognition using Images and Audio Clips

For the general concept, please see docs/presentation.pdf.

Training notebooks

Our networks were trained on Google Colab, see the directory training_notebooks. Their weights are stored in the weights directory.

Networks

The following files contain the codes of our implemented networks in the Image classification pipeline. We basically obatined the source code from the Keras GitHub repo, replaced all Conv2D layers by SeparableConv2D layers, and slightly modified the original networks to reduce the number of parameters.

ResNet.py
InceptionNet.py

Preprocessing

preprocessing.ipynb runs the benchmark code to calculate the equivalent number of parameters involved in the audio and image preprocessing stages.

Pieces of the processing pipeline

If you want to, you can skip immediately to 3_combine.py as all the embeddings are already included in this directory

The different python files at the root of the directory implement the general pipeline. After placing the file audVisIdn.npz in the directory datadir, run the files in order:

11_preprocessing_audio.py computes the spectral features we fit on. Here, we use librosa.
12_embedding_audio.ipynb runs the clustering network whose weights are in weights/audio_clusterer_trimmed.h5 on the spectral features extracted before.
21_preprocessing_image.py detects faces and crops them out of the original picture.
22_embedding_image.ipynb runs the InceptionResnetV2 with weights weights/28epochscrosscat_30epochstriplet_inceptionresnet_smaller_sepconv.h5 to create embeddings of the pictures.
3_combine.py trains a dense network on the audio and image embeddings created in the previous steps, and outputs the test accuracy.

The final test accuracy we obtain is 95.4%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Identity Recognition using Images and Audio Clips

Training notebooks

Networks

Preprocessing

Pieces of the processing pipeline

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
embeddings		embeddings
labels		labels
training_notebooks		training_notebooks
11_preprocessing_audio.py		11_preprocessing_audio.py
12_embedding_audio.ipynb		12_embedding_audio.ipynb
21_preprocessing_image.py		21_preprocessing_image.py
22_embedding_image.ipynb		22_embedding_image.ipynb
3_combining.ipynb		3_combining.ipynb
InceptionNet.py		InceptionNet.py
LICENSE		LICENSE
README.md		README.md
ResNet.py		ResNet.py
preprocessing.ipynb		preprocessing.ipynb
timing.py		timing.py

License

utsav-akhaury/Identity-Recognition

Folders and files

Latest commit

History

Repository files navigation

Identity Recognition using Images and Audio Clips

Training notebooks

Networks

Preprocessing

Pieces of the processing pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages