Skip to content

jonesberg/DataAnalysisWithPythonAndPySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Analysis with Python and PySpark

This is the companion repository for the Data Analysis with Python and PySpark book (Manning, 2022). It contains the source code and data download scripts, when pertinent.

NEW (June 2025): Databricks Free

With Databricks offering free access of most important functionalities, you can now avoid installing (and paying) for your own version. I've created a notebook/file you can use to get all the data in tables and volumes. Five minutes and you're ready to work through the code examples, no fuss!

Just clone the repository in databricks and open the data download notebook.

Get the data (old version, still works)

The complete data set for the book hovers at around ~1GB. Because of this, I moved the data sources to another repository to avoid cloning a gigantic repository just to get the code. The book assumes the data is under ./data.

Mistakes or omissions

If you encounter mistakes in the book manuscript (including the printed source code), please use the Manning platform to provide feedback.

About

Code repository for the "PySpark in Action" book

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages