Skip to content

Commit 8d00db0

Browse files
authored
Adding pandarallel library (#1005)
* Adding pandarallel library Pandarallel allows to execute pandas apply method in parallel, which allows to do data preprocessing faster and easier. This is very he;pful in kernel only competiotions. See: https://github.com/nalepae/pandarallel * Adding test to pandarallel Adding a simple test
1 parent 3d7dd66 commit 8d00db0

File tree

2 files changed

+12
-0
lines changed

2 files changed

+12
-0
lines changed

Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -427,6 +427,7 @@ RUN pip install flashtext && \
427427
pip install jax==0.2.12 jaxlib==0.1.64 && \
428428
# ipympl adds interactive widget support for matplotlib
429429
pip install ipympl==0.7.0 && \
430+
pip install pandarallel && \
430431
/tmp/clean-layer.sh
431432

432433
# Download base easyocr models.

tests/test_pandarralel.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
import unittest
2+
3+
import pandas as pd
4+
from pandarallel import pandarallel
5+
6+
pandarallel.initialize()
7+
8+
class TestPandarallel(unittest.TestCase):
9+
def test_pandarallel(self):
10+
data = pd.read_csv("/input/tests/data/train.csv")
11+
data['label_converted'] = data['label'].parallel_apply(lambda x: x+1)

0 commit comments

Comments
 (0)