Unsupervised Acoustic Keylogger

Restoration example

Red characters are errors.

Installation

Just download this repo. There's no requirements list yet. You just have to install all of requirements manually.

Usage

This is not a library. This is a set of tools to perform unsupervised acoustic keyboard attack. File main.ipynb in src dir includes a demo of an attack. Feel free to change any step in the process.

Overview of the attack

Data collection. Collect data for the attack. datasetMaker.py is designed to obtain labeled dataset for testing purposes.
Keystrokes detection. This step built-in KeySoundDataset class. For now, it can only work with labeled data (only for purpose of evaluation) but you can easily change it yourself for unlabeled data (sound only). Keystrokes detection IS NOT perfect, so you must test out the results of this step and tune some hyperparameters to reach certain level of keystrokes detection precision before next step.
Feature extraction. This step built-in KeySoundDataset class. You can choose between Mel-Frequency Spectrograms and MFCC ('mel_spec' and 'mfcc' for 'mode' parameter).
Dimensionality reduction of extracted features. This step built-in KeySoundDataset class. Use reduse_dims parameter (None for no dimensionality reduction, positive integer N to reduce dimensionality to N). We use UMAP for this purpose.
Training CBoS for improving clustering. This is an optional step that can increase precision in the next step. Implemented in the main.ipynb.
HMM prediction. Demo in main.ipynb. We cluster data and predict the original text based on thees clusters using Hidden Markov Model.
Language correction. It's not working great, but can improve the next step precision.
Classifier training. We build classifier on filtered labeled data obtained in previous steps. Then we predict text using classifier and repeat this process several times.

Data

Dir various_datasets has several demo labeled audio recordings.

Results

Precision after HMM step and Classifier cycle for one text on 3 different setups.

	tea_1	tea_2_cleared	tea_3_cleared
HMM Step	90.2%	80.9%	70%
Classifier Cycle	96.5%	93.6%	86.6%

Theoretical background

For more theoretical background and statistic you can read my bachelor's degree thesis "Clustering Contextual Data For Acoustic Audio Attack" in doc dir. It is in Russian, but you can easily translate it with Google translate or Yandex Translate(because it works better for Russian language).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
doc		doc
scowl_wordlist		scowl_wordlist
src		src
various_datasets		various_datasets
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unsupervised Acoustic Keylogger

Restoration example

Installation

Usage

Overview of the attack

Data

Results

Theoretical background

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AlexN1ght/Acoustic-Keylogger

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Acoustic Keylogger

Restoration example

Installation

Usage

Overview of the attack

Data

Results

Theoretical background

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages