You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Chapter5/natural_language_processing.html
+42Lines changed: 42 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -535,6 +535,7 @@ <h2> Contents </h2>
535
535
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#ekphrasis-text-processing-tool-for-social-media-text">6.6.20. ekphrasis: Text Processing Tool For Social Media Text</a></li>
536
536
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#chroma-the-lightning-fast-solution-to-text-embeddings-and-querying">6.6.21. Chroma: The Lightning-Fast Solution to Text Embeddings and Querying</a></li>
537
537
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#galatic-clean-and-analyze-massive-text-datasets">6.6.22. Galatic: Clean and Analyze Massive Text Datasets</a></li>
538
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#efficient-keyword-extraction-and-replacement-with-flashtext">6.6.23. Efficient Keyword Extraction and Replacement with FlashText</a></li>
538
539
</ul>
539
540
</nav>
540
541
</div>
@@ -2368,6 +2369,46 @@ <h2><span class="section-number">6.6.22. </span>Galatic: Clean and Analyze Massi
2368
2369
</div>
2369
2370
<p><aclass="reference external" href="https://github.com/taylorai/galactic">Link to Galatic</a>.</p>
<h2><spanclass="section-number">6.6.23. </span>Efficient Keyword Extraction and Replacement with FlashText<aclass="headerlink" href="#efficient-keyword-extraction-and-replacement-with-flashtext" title="Permalink to this heading">#</a></h2>
<spanclass="c1"># Replacing keywords in text</span>
2399
+
<spanclass="n">new_sentence</span><spanclass="o">=</span><spanclass="n">keyword_processor</span><spanclass="o">.</span><spanclass="n">replace_keywords</span><spanclass="p">(</span><spanclass="s2">"PYTHON is essential for DS."</span><spanclass="p">)</span>
2400
+
<spanclass="n">new_sentence</span>
2401
+
</pre></div>
2402
+
</div>
2403
+
</div>
2404
+
<divclass="cell_output docutils container">
2405
+
<divclass="output text_plain highlight-myst-ansi notranslate"><divclass="highlight"><pre><span></span>'Python is essential for data science.'
2406
+
</pre></div>
2407
+
</div>
2408
+
</div>
2409
+
</div>
2410
+
<p><aclass="reference external" href="https://bit.ly/4bQ1eqt">Link to FlashText</a>.</p>
2411
+
</section>
2371
2412
</section>
2372
2413
2373
2414
<scripttype="text/x-thebe-config">
@@ -2455,6 +2496,7 @@ <h2><span class="section-number">6.6.22. </span>Galatic: Clean and Analyze Massi
2455
2496
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#ekphrasis-text-processing-tool-for-social-media-text">6.6.20. ekphrasis: Text Processing Tool For Social Media Text</a></li>
2456
2497
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#chroma-the-lightning-fast-solution-to-text-embeddings-and-querying">6.6.21. Chroma: The Lightning-Fast Solution to Text Embeddings and Querying</a></li>
2457
2498
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#galatic-clean-and-analyze-massive-text-datasets">6.6.22. Galatic: Clean and Analyze Massive Text Datasets</a></li>
2499
+
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#efficient-keyword-extraction-and-replacement-with-flashtext">6.6.23. Efficient Keyword Extraction and Replacement with FlashText</a></li>
Copy file name to clipboardExpand all lines: docs/Chapter5/spark.html
+1-4Lines changed: 1 addition & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1656,7 +1656,7 @@ <h2><span class="section-number">6.15.9. </span>Vectorized Operations in PySpark
1656
1656
</div>
1657
1657
</div>
1658
1658
<p>Standard UDF functions process data row-by-row, resulting in Python function call overhead.</p>
1659
-
<p>In contrast, pandas_udf utilizes Pandas’ vectorized operations to process entire columns in a single operation, significantly improving performance.</p>
1659
+
<p>In contrast, pandas_udf uses Pandas’ vectorized operations to process entire columns in a single operation, significantly improving performance.</p>
0 commit comments