Skip to content

Commit 1babbf2

Browse files
authored
Add Parallel Processing Input to Item Chip Generator (#413)
* Add Parallel Processing Input to Item Chip Generator * use int * update changelog * update changelog
1 parent f2dbd31 commit 1babbf2

File tree

2 files changed

+27
-16
lines changed

2 files changed

+27
-16
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@ All notable changes to the [Nucleus Python Client](https://github.com/scaleapi/n
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.16.12](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.16.12) - 2023-11-27
9+
10+
### Added
11+
12+
- Added `num_processes` parameter to `dataset.items_and_annotation_chip_generator()` to specify parallel processing.
813

914
## [0.16.11](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.16.11) - 2023-11-22
1015

nucleus/dataset.py

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1485,6 +1485,7 @@ def items_and_annotation_chip_generator(
14851485
stride_size: int,
14861486
cache_directory: str,
14871487
query: Optional[str] = None,
1488+
num_processes: int = 0,
14881489
) -> Iterable[Dict[str, str]]:
14891490
"""Provides a generator of chips for all DatasetItems and BoxAnnotations in the dataset.
14901491
@@ -1498,6 +1499,7 @@ def items_and_annotation_chip_generator(
14981499
cache_directory: The s3 or local directory to store the image and annotations of a chip.
14991500
s3 directories must be in the format s3://s3-bucket/s3-key
15001501
query: Structured query compatible with the `Nucleus query language <https://nucleus.scale.com/docs/query-language-reference>`_.
1502+
num_processes: The number of worker processes to use to chip and upload images. If unset, no parallel processing will occur.
15011503
15021504
Returns:
15031505
Generator where each element is a dict containing the location of the image chip (jpeg) and its annotations (json).
@@ -1522,22 +1524,26 @@ def items_and_annotation_chip_generator(
15221524
annotations = item[BOX_TYPE]
15231525
item_ref_id = item[ITEM_KEY][REFERENCE_ID_KEY]
15241526
offsets = generate_offsets(w, h, chip_size, stride_size)
1525-
with Pool() as pool:
1526-
chip_args = [
1527-
(
1528-
offset,
1529-
chip_size,
1530-
w,
1531-
h,
1532-
item_ref_id,
1533-
cache_directory,
1534-
image,
1535-
annotations,
1536-
)
1537-
for offset in offsets
1538-
]
1539-
for chip_result in pool.imap(process_chip, chip_args):
1540-
yield chip_result
1527+
chip_args = [
1528+
(
1529+
offset,
1530+
chip_size,
1531+
w,
1532+
h,
1533+
item_ref_id,
1534+
cache_directory,
1535+
image,
1536+
annotations,
1537+
)
1538+
for offset in offsets
1539+
]
1540+
if num_processes:
1541+
with Pool(num_processes) as pool:
1542+
for chip_result in pool.imap(process_chip, chip_args):
1543+
yield chip_result
1544+
else:
1545+
for chip_arg in chip_args:
1546+
yield process_chip(chip_arg)
15411547

15421548
def export_embeddings(
15431549
self,

0 commit comments

Comments
 (0)