Speech-to-text-data-collection

Introduction

In today’s data-driven world of cut-throat competition, creating, executing and monitoring different tasks and large volumes of data is no small feat. Most companies, hence need an automated solution, that will help them manage their daily tasks.

The Apache Kafka and Airflow are open-source task management platforms that help companies create seamlessly functioning workflows to organise, execute and monitor their tasks. Although these platforms seem to perform related tasks, some crucial differences between the two set up them apart. Spark Streaming API enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka.

Our responsibility was to build a tool that can be deployed to process posting and receiving text and audio files from and into a data lake, apply transformation in a distributed manner, and load it into a warehouse in a suitable format to train a speech-t0-text model..

Key Topics

Loading ....

Learning Objectives

Skills/Task:

Create and maintain an Apache Kafka cluster
Work with Apache Airflow and Apache Spark
Apply Structured Streaming to process streaming data.
Building data pipelines and orchestration workflows

Knowledge: Enterprise-grade data engineering - using Apache and Databricks tools

Helpful Links

Loading....

Technologies used

Apache Kafka: To sequentially log streaming data into specific topics
Apache Airflow: used to create, orchestrate and monitor data workflows. In other words, he will be used to create and update the model, whilst also scheduling such tasks.
S3 Buckets: For storing transformed streaming data
Appache Spark : It will be used for data preprocessing,to validate the data and finally transform the data into corpus text.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Picture		Picture
__pycache__		__pycache__
airflow-docker		airflow-docker
kafka		kafka
notebooks		notebooks
script		script
scripts		scripts
server		server
spark		spark
.travis.yml		.travis.yml
DAG.py		DAG.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech-to-text-data-collection

Introduction

Key Topics

Learning Objectives

Helpful Links

Technologies used

DAG

Contributors

About

Uh oh!

Releases

Packages

Contributors 7

Uh oh!

Languages

10-Academy-batch-4-week-9/speech-to-text-data-collection

Folders and files

Latest commit

History

Repository files navigation

Speech-to-text-data-collection

Introduction

Key Topics

Learning Objectives

Helpful Links

Technologies used

DAG

Contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Languages

Packages