Largely based on the excellent work in hb-song-analysis
This project is a Python-based sound topic modeling tool that uses the rost-cli tool to analyze audio data.
There are many parameters that can be adjusted to tune the model to the data.
Default parameters are set to work well with the MBARI MARS hydrophone data, but can be adjusted.
All parameters are in the conf.py
file.
To use, place your audio data in the data/
directory,
and set the target_file
variable in conf.py
to the name of an audio file you
would like to visualize the output on.
🐳 Option 1: ROST via Docker (Recommended)
Build the Docker image from the project root:
docker build -t rost-cli -f DockerfileROST .
Then run the container to test with the topics.refine.t
command:
docker run -it --rm rost-cli topics.refine.t --help
You should see the help message for the topics.refine.t
command, which indicates that the ROST CLI is working correctly.
(venv) docker run -it --rm rost-cli topics.refine.t --help (stm) 12:35:46
Topic modeling of data with 1 dimensional structure.:
--help help
-i [ --in.words ] arg (=/dev/stdin) Word frequency count file. Each line is
a document/cell, with integer
representation of words.
--in.words.delim arg (=,) delimiter used to seperate words.
--out.topics arg (=topics.csv) Output topics file
--out.topics.ml arg (=topics.maxlikelihood.csv)
Output maximum likelihood topics file
--out.topicmodel arg (=topicmodel.csv)
Output topic model file
--in.topicmodel arg Input topic model file
--in.topics arg Initial topic labels
Make sure the docker engine is running through python with the command:
python -c "import docker; print(docker.__version__)"
You should see the version of the docker package installed in your Python environment, e.g.
7.1.0
if not running, run the command:
pip3 install docker
Clone the rost-cli repository
git clone https://gitlab.com/warplab/rost-cli
and follow the installation instructions found in their README.
Once ROST is installed, change the rost_path
variable in conf.py to be the path to /rost-cli/bin/ directory
on your machine.
Install with the package manager your prefer.
With conda,
conda env create
With virtualenv, pip install virtualenv
then
virtualenv --python=python3.10 venv
source venv/bin/activate
(venv) $ pip install -r requirements.txt
Once all dependencies are installed, the project can be run with (venv) $ ./run_model.sh
, which will run all
project modules using the parameters specified in conf.py.
Modules can be run individually with (venv) $ python module_name.py
. For example, the stft module
can be run individually with (venv) $ python stft.py
. You may want to run the modules
individually to test different parameters of that module without rerunning the previous modules.
A successful run will produce a plot similar to the following:
On top is a spectrogram of a ~50-second segment of the target file specified in conf.py. On bottom
is a stacked bar plot of topic probabilities over documents, where documents represent small increments
of time.
- If the modules are run individually, they should be run in the order that they are run in run_model.sh. each is dependent on the output of the last.
- If visualize.py raises an error saying that Python was not installed as a framework, try the solution here