You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Chapter5/better_pandas.ipynb
+90-2Lines changed: 90 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -2982,12 +2982,100 @@
2982
2982
},
2983
2983
{
2984
2984
"cell_type": "markdown",
2985
-
"id": "711c5d5a",
2985
+
"id": "9c8a143e",
2986
+
"metadata": {},
2987
+
"source": [
2988
+
"### Pandas vs Polars: Harnessing Parallelism for Faster Data Processing"
2989
+
]
2990
+
},
2991
+
{
2992
+
"cell_type": "code",
2993
+
"execution_count": null,
2994
+
"id": "bccc50c3",
2995
+
"metadata": {
2996
+
"tags": [
2997
+
"hide-cell"
2998
+
]
2999
+
},
3000
+
"outputs": [],
3001
+
"source": [
3002
+
"!pip install polars"
3003
+
]
3004
+
},
3005
+
{
3006
+
"cell_type": "markdown",
3007
+
"id": "35b7b8c6",
3008
+
"metadata": {},
3009
+
"source": [
3010
+
"Pandas is a single-threaded library, utilizing only a single CPU core. To achieve parallelism with Pandas, you would need to use additional libraries like Dask."
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#polars-speed-up-data-processing-12x-with-lazy-execution">6.12.14. Polars: Speed Up Data Processing 12x with Lazy Execution</a></li>
530
530
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#polars-vs-pandas-for-csv-loading-and-filtering">6.12.15. Polars vs. Pandas for CSV Loading and Filtering</a></li>
531
-
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#simple-and-expressive-data-transformation-with-polars">6.12.16. Simple and Expressive Data Transformation with Polars</a></li>
532
-
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#harness-polars-and-delta-lake-for-blazing-fast-performance">6.12.17. Harness Polars and Delta Lake for Blazing Fast Performance</a></li>
533
-
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#parallel-execution-of-multiple-files-with-polars">6.12.18. Parallel Execution of Multiple Files with Polars</a></li>
534
-
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#polars-streaming-mode-a-solution-for-large-data-sets">6.12.19. Polars’ Streaming Mode: A Solution for Large Data Sets</a></li>
531
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#pandas-vs-polars-harnessing-parallelism-for-faster-data-processing">6.12.16. Pandas vs Polars: Harnessing Parallelism for Faster Data Processing</a></li>
532
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#simple-and-expressive-data-transformation-with-polars">6.12.17. Simple and Expressive Data Transformation with Polars</a></li>
533
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#harness-polars-and-delta-lake-for-blazing-fast-performance">6.12.18. Harness Polars and Delta Lake for Blazing Fast Performance</a></li>
534
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#parallel-execution-of-multiple-files-with-polars">6.12.19. Parallel Execution of Multiple Files with Polars</a></li>
535
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#polars-streaming-mode-a-solution-for-large-data-sets">6.12.20. Polars’ Streaming Mode: A Solution for Large Data Sets</a></li>
535
536
</ul>
536
537
</nav>
537
538
</div>
@@ -2497,8 +2498,56 @@ <h2><span class="section-number">6.12.15. </span>Polars vs. Pandas for CSV Loadi
<h2><spanclass="section-number">6.12.16. </span>Pandas vs Polars: Harnessing Parallelism for Faster Data Processing<aclass="headerlink" href="#pandas-vs-polars-harnessing-parallelism-for-faster-data-processing" title="Permalink to this heading">#</a></h2>
<p>Pandas is a single-threaded library, utilizing only a single CPU core. To achieve parallelism with Pandas, you would need to use additional libraries like Dask.</p>
<h2><spanclass="section-number">6.12.16. </span>Simple and Expressive Data Transformation with Polars<aclass="headerlink" href="#simple-and-expressive-data-transformation-with-polars" title="Permalink to this heading">#</a></h2>
2549
+
<h2><spanclass="section-number">6.12.17. </span>Simple and Expressive Data Transformation with Polars<aclass="headerlink" href="#simple-and-expressive-data-transformation-with-polars" title="Permalink to this heading">#</a></h2>
2550
+
<p>Extract features and select only relevant features for each time series.</p>
<h2><spanclass="section-number">6.12.17. </span>Harness Polars and Delta Lake for Blazing Fast Performance<aclass="headerlink" href="#harness-polars-and-delta-lake-for-blazing-fast-performance" title="Permalink to this heading">#</a></h2>
2644
+
<h2><spanclass="section-number">6.12.18. </span>Harness Polars and Delta Lake for Blazing Fast Performance<aclass="headerlink" href="#harness-polars-and-delta-lake-for-blazing-fast-performance" title="Permalink to this heading">#</a></h2>
<h2><spanclass="section-number">6.12.18. </span>Parallel Execution of Multiple Files with Polars<aclass="headerlink" href="#parallel-execution-of-multiple-files-with-polars" title="Permalink to this heading">#</a></h2>
2865
+
<h2><spanclass="section-number">6.12.19. </span>Parallel Execution of Multiple Files with Polars<aclass="headerlink" href="#parallel-execution-of-multiple-files-with-polars" title="Permalink to this heading">#</a></h2>
<h2><spanclass="section-number">6.12.19. </span>Polars’ Streaming Mode: A Solution for Large Data Sets<aclass="headerlink" href="#polars-streaming-mode-a-solution-for-large-data-sets" title="Permalink to this heading">#</a></h2>
2936
+
<h2><spanclass="section-number">6.12.20. </span>Polars’ Streaming Mode: A Solution for Large Data Sets<aclass="headerlink" href="#polars-streaming-mode-a-solution-for-large-data-sets" title="Permalink to this heading">#</a></h2>
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#polars-speed-up-data-processing-12x-with-lazy-execution">6.12.14. Polars: Speed Up Data Processing 12x with Lazy Execution</a></li>
2998
3047
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#polars-vs-pandas-for-csv-loading-and-filtering">6.12.15. Polars vs. Pandas for CSV Loading and Filtering</a></li>
2999
-
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#simple-and-expressive-data-transformation-with-polars">6.12.16. Simple and Expressive Data Transformation with Polars</a></li>
3000
-
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#harness-polars-and-delta-lake-for-blazing-fast-performance">6.12.17. Harness Polars and Delta Lake for Blazing Fast Performance</a></li>
3001
-
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#parallel-execution-of-multiple-files-with-polars">6.12.18. Parallel Execution of Multiple Files with Polars</a></li>
3002
-
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#polars-streaming-mode-a-solution-for-large-data-sets">6.12.19. Polars’ Streaming Mode: A Solution for Large Data Sets</a></li>
3048
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#pandas-vs-polars-harnessing-parallelism-for-faster-data-processing">6.12.16. Pandas vs Polars: Harnessing Parallelism for Faster Data Processing</a></li>
3049
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#simple-and-expressive-data-transformation-with-polars">6.12.17. Simple and Expressive Data Transformation with Polars</a></li>
3050
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#harness-polars-and-delta-lake-for-blazing-fast-performance">6.12.18. Harness Polars and Delta Lake for Blazing Fast Performance</a></li>
3051
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#parallel-execution-of-multiple-files-with-polars">6.12.19. Parallel Execution of Multiple Files with Polars</a></li>
3052
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#polars-streaming-mode-a-solution-for-large-data-sets">6.12.20. Polars’ Streaming Mode: A Solution for Large Data Sets</a></li>
Copy file name to clipboardExpand all lines: docs/_sources/Chapter5/better_pandas.ipynb
+90-2Lines changed: 90 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -2982,12 +2982,100 @@
2982
2982
},
2983
2983
{
2984
2984
"cell_type": "markdown",
2985
-
"id": "711c5d5a",
2985
+
"id": "9c8a143e",
2986
+
"metadata": {},
2987
+
"source": [
2988
+
"### Pandas vs Polars: Harnessing Parallelism for Faster Data Processing"
2989
+
]
2990
+
},
2991
+
{
2992
+
"cell_type": "code",
2993
+
"execution_count": null,
2994
+
"id": "bccc50c3",
2995
+
"metadata": {
2996
+
"tags": [
2997
+
"hide-cell"
2998
+
]
2999
+
},
3000
+
"outputs": [],
3001
+
"source": [
3002
+
"!pip install polars"
3003
+
]
3004
+
},
3005
+
{
3006
+
"cell_type": "markdown",
3007
+
"id": "35b7b8c6",
3008
+
"metadata": {},
3009
+
"source": [
3010
+
"Pandas is a single-threaded library, utilizing only a single CPU core. To achieve parallelism with Pandas, you would need to use additional libraries like Dask."
0 commit comments