Python implementation of the Spark video stream analytics project. The original project was implemented in Java and can be found here.
First, make sure you have a Python environment set up. You can use the following command to create a new environment:
python3 -m venv ./venvThen source the enviroment with the following command:
source venv/bin/activateFinally, install the required packages with the following command:
python3 -m pip install -r requirements.txtMake sure to deactivate the environment when you are done with the following command:
deactivateWhenever you want to work on the project, make sure to source the environment before running the code.
Currently, the project is utilizing the pyspark library in its version 3.5.1
Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.8+, and R 3.5+. Java 8 prior to version 8u371 support is deprecated as of Spark 3.5.0.
So, in order to run the project without issues, make sure you have Java 8/11/17 installed on your machine.
Even though the project is using the pyspark library, it is necessary to have the Spark service installed and running on your machine when launching thr project.
Before installing Spark, make sure you have Java 8/11/17 installed on your machine and the variable JAVA_HOME is correctly set in your environment variables.
Now, the following steps will guide you through the installation of Spark:
- Download the latest version of Spark from the official website.
- Extract the downloaded file to a directory of your choice.
- Set the
SPARK_HOMEvariable in your environment variables to the directory where you extracted the Spark files. - Test the installation by running the following command:
# launch Scala Based Spark
spark-shell
# launch PySpark
pysparkIf you see the Spark shell, then the installation was successful.
Even tho the project kafka-python library to interact with Kafka, it is necessary to have Kafka and Zookeeper services running on your machine. In order to achieve that, those environments were configured to work through a Docker container.
So, having docker installed and working on your machine is a requirement to run the project.
Now, the following steps will guide you through the installation of the container with Kafka and Zookeeper:
In the project root, run the following command to build the containers:
make buildTo stand up Kafka and Spark services, run:
make runThis command will start both Kafka and Spark. You can also build Spark services with 3 workers using:
make run-scaledBefore running the scripts, you must create and activate a virtual environment:
virtualenv venv
source venv/bin/activateTo run the video stream collector, run the following command:
python src/video-stream-collector.py --config {{ CONFIG_FILE }}Where CONFIG_FILE is the path to the configuration file. Multiple example configuration files can be found in the config/collector directory.
Example
python src/stream_collector.py --config config/collector/file_cam_local.yamlTo run stream_processor.py, run:
make submit app=src/stream_processor.pyThere are several commands to build and manage standalone Spark cluster. You can check the Makefile to see them all. The simplest command to build is:
make buildTo run the motion detection demo, run the following command:
python src/motion-demo.pyTo run the video stream collector, run the following command:
python src/video-stream-collector.py --config {{ CONFIG_FILE }}Where CONFIG_FILE is the path to the configuration file. Multiple example configuration files can be found in the config/collector directory.
All of the configuration files can be used to test the video stream collector.
To run the video stream processor, we need to have the Spark service running. Once the service is running, run the following command:
pyspark < src/video_stream_processor.py