High-Performance PySpark: Advanced Strategies for Optimal Data Processing

This is the repository for the LinkedIn Learning course High-Performance PySpark: Advanced Strategies for Optimal Data Processing. The full course is available from LinkedIn Learning.

Course Description

Master the art of efficient data processing with this advanced PySpark course designed for data engineers. Instructor Ameena Ansari shows you the essentials of optimizing the data cleaning process and defining schemas to streamline ingestion at scale. Explore various data formats and compression techniques to ensure seamless performance, even with massive datasets. By the end of this course, you'll have the tools and skills you need to transform and ingest high-quality data using PySpark pipelines that are both scalable and efficient.

This course is integrated with GitHub Codespaces, an instant cloud developer environment that offers all the functionality of your favorite IDE without the need for any local machine setup. With GitHub Codespaces, you can get hands-on practice from any machine, at any time—all while using a tool that you’ll likely encounter in the workplace. Check out “Using GitHub Codespaces" with this course to learn how to get started.

Instructor

Ameena Ansari

Senior Data Engineer | Distributed Systems Enthusiast

Check out my other courses on LinkedIn Learning.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
.github		.github
.venv		.venv
.vscode		.vscode
Notebooks		Notebooks
data		data
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

High-Performance PySpark: Advanced Strategies for Optimal Data Processing

Course Description

Instructor

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

LinkedInLearning/high-performance-pyspark-advanced-strategies-for-optimal-data-processing-3919191

Folders and files

Latest commit

History

Repository files navigation

High-Performance PySpark: Advanced Strategies for Optimal Data Processing

Course Description

Instructor

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages