File tree Expand file tree Collapse file tree 2 files changed +73
-11
lines changed Expand file tree Collapse file tree 2 files changed +73
-11
lines changed Original file line number Diff line number Diff line change 24329
24329
"source": [
24330
24330
"[Link to Galatic](https://github.com/taylorai/galactic)."
24331
24331
]
24332
+ },
24333
+ {
24334
+ "cell_type": "markdown",
24335
+ "id": "b0da6a5a",
24336
+ "metadata": {},
24337
+ "source": [
24338
+ "### Efficient Keyword Extraction and Replacement with FlashText"
24339
+ ]
24340
+ },
24341
+ {
24342
+ "cell_type": "code",
24343
+ "execution_count": null,
24344
+ "id": "6ee867c1",
24345
+ "metadata": {
24346
+ "tags": [
24347
+ "hide-cell"
24348
+ ]
24349
+ },
24350
+ "outputs": [],
24351
+ "source": [
24352
+ "!pip install flashtext"
24353
+ ]
24354
+ },
24355
+ {
24356
+ "cell_type": "markdown",
24357
+ "id": "611bb3c5",
24358
+ "metadata": {},
24359
+ "source": [
24360
+ "If you want to perform fast keyword extraction and replacement in text, use FlashText. "
24361
+ ]
24362
+ },
24363
+ {
24364
+ "cell_type": "code",
24365
+ "execution_count": 6,
24366
+ "id": "a52f3e89",
24367
+ "metadata": {},
24368
+ "outputs": [
24369
+ {
24370
+ "data": {
24371
+ "text/plain": [
24372
+ "'Python is essential for data science.'"
24373
+ ]
24374
+ },
24375
+ "execution_count": 6,
24376
+ "metadata": {},
24377
+ "output_type": "execute_result"
24378
+ }
24379
+ ],
24380
+ "source": [
24381
+ "from flashtext import KeywordProcessor\n",
24382
+ "\n",
24383
+ "keyword_processor = KeywordProcessor()\n",
24384
+ "\n",
24385
+ "# Adding keywords with replacements\n",
24386
+ "keyword_processor.add_keyword(keyword=\"Python\")\n",
24387
+ "keyword_processor.add_keyword(keyword=\"DS\", clean_name=\"data science\")\n",
24388
+ "\n",
24389
+ "# Replacing keywords in text\n",
24390
+ "new_sentence = keyword_processor.replace_keywords(\"PYTHON is essential for DS.\")\n",
24391
+ "new_sentence"
24392
+ ]
24393
+ },
24394
+ {
24395
+ "cell_type": "markdown",
24396
+ "id": "0b85c2a7",
24397
+ "metadata": {},
24398
+ "source": [
24399
+ "[Link to FlashText](https://bit.ly/4bQ1eqt)."
24400
+ ]
24332
24401
}
24333
24402
],
24334
24403
"metadata": {
Original file line number Diff line number Diff line change 1655
1655
"source" : [
1656
1656
" Standard UDF functions process data row-by-row, resulting in Python function call overhead. \n " ,
1657
1657
" \n " ,
1658
- " In contrast, pandas_udf utilizes Pandas' vectorized operations to process entire columns in a single operation, significantly improving performance."
1658
+ " In contrast, pandas_udf uses Pandas' vectorized operations to process entire columns in a single operation, significantly improving performance."
1659
1659
]
1660
1660
},
1661
1661
{
1662
1662
"cell_type" : " code" ,
1663
- "execution_count" : 3 ,
1663
+ "execution_count" : 2 ,
1664
1664
"id" : " a4633f44" ,
1665
1665
"metadata" : {},
1666
1666
"outputs" : [
1697
1697
},
1698
1698
{
1699
1699
"cell_type" : " code" ,
1700
- "execution_count" : 4 ,
1700
+ "execution_count" : 3 ,
1701
1701
"id" : " fcf0cdf9" ,
1702
1702
"metadata" : {},
1703
1703
"outputs" : [
1704
- {
1705
- "name" : " stderr" ,
1706
- "output_type" : " stream" ,
1707
- "text" : [
1708
- " \r "
1709
- ]
1710
- },
1711
1704
{
1712
1705
"name" : " stdout" ,
1713
1706
"output_type" : " stream" ,
1738
1731
},
1739
1732
{
1740
1733
"cell_type" : " code" ,
1741
- "execution_count" : 8 ,
1734
+ "execution_count" : 4 ,
1742
1735
"id" : " e1ec8b2b" ,
1743
1736
"metadata" : {},
1744
1737
"outputs" : [
You can’t perform that action at this time.
0 commit comments