Skip to content

This repository contains my hands-on journey through the Big Data training by NTI. Includes labs, notebooks, and notes on tools like HDFS, Spark, Kafka, Flink, Hive, HBase and more.

Notifications You must be signed in to change notification settings

keroloshany47/NTI_Big_Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Big Data Training – NTI

This repository documents my hands-on journey through the Big Data Summer Training Program organized by NTI in collaboration with ITIDA.
The program diving deep into Big Data tools, platforms, and real-world applications.


What You’ll Find Here

This repo contains:

  • Jupyter Notebooks & Labs from each topic
  • Technical notes & key takeaways
  • Practice examples and use-case simulations
  • Commands and setups used in the virtual environment

Topics Covered

  1. Big Data Era & Kunpeng Architecture
  2. HDFS + ZooKeeper – Distributed storage and cluster coordination
  3. HBase + Hive – NoSQL + distributed data warehouse (SQL-like)
  4. ClickHouse – OLAP database for real-time analytics
  5. MapReduce + YARN – Distributed processing engine and resource manager
  6. Spark + Flink – Batch + Stream processing with in-memory computing
  7. Flume + Kafka – Data ingestion and real-time messaging pipelines
  8. Elasticsearch – Distributed search engine and analytics

Tools & Technologies

Tool/Tech Use Case
Linux, SQL, Python Foundations for scripting &querying
HDFS Distributed data storage
Hive SQL-style querying
HBase NoSQL for large-scale datasets
Kafka Real-time messaging system
Spark & Flink Data processing engines
ClickHouse High-performance analytics
Flume, Sqoop Data ingestion from logs & DBs
Elasticsearch Search and analytics
ZooKeeper Cluster coordination

Goal of this Repo

This repo serves as:

  • A personal reference and knowledge base
  • A full recap of my learning journey
  • A practical showcase for Big Data skills

Let's Connect

Feel free to explore the notebooks or reach out if you'd like to collaborate or discuss Big Data topics!

Reach out to me on LinkedIn

About

This repository contains my hands-on journey through the Big Data training by NTI. Includes labs, notebooks, and notes on tools like HDFS, Spark, Kafka, Flink, Hive, HBase and more.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published