Skip to content

Return paths of downloaded files? #151

@CPBridge

Description

@CPBridge

I have been working with idc-index a bit recently, and there's one thing I find a bit frustrating.

I often have a workflow where I want to download an instance or series and then immediately read it in using pydicom in the following line. However, because the file gets downloaded into a tree-like structure (that I understand is controlled by the dirTemplate), I often have to go globbing or iterating through directories to find the files I just downloaded.

I think it would be a nice feature if the "download" methods returned a list of filepaths (as strs or as pathlibs.Paths, I prefer Paths personally) so that you could instantly load in the file(s).

That would allow me to replace code like this:

from pathlib import Path
import highdicom as hd

from idc_index import IDCClient

segmentation_series_uid = "1.2.276.0.7230010.3.1.3.313263360.31993.1706319455.429793"

# Temporary download directory for downloads and new files
download_dir = Path('downloads')
download_dir.mkdir(exist_ok=True)

# Download a segmentation and load it
client = IDCClient()
client.download_dicom_series(
    segmentation_series_uid,
    downloadDir=download_dir,
    dirTemplate='%SeriesInstanceUID'
)

# This is a bit ugly in my opinion
seg_file = list((download_dir / segmentation_series_uid).glob('*.dcm'))[0]
seg = hd.seg.segread(seg_file)

with this

from pathlib import Path
import highdicom as hd

from idc_index import IDCClient

segmentation_series_uid = "1.2.276.0.7230010.3.1.3.313263360.31993.1706319455.429793"

# Temporary download directory for downloads and new files
download_dir = Path('downloads')
download_dir.mkdir(exist_ok=True)

# Download a segmentation and load it
client = IDCClient()
seg_files = client.download_dicom_series(
    segmentation_series_uid,
    downloadDir=download_dir,
    dirTemplate='%SeriesInstanceUID'
)

seg = hd.seg.segread(seg_files[0])

Alternatively/additionally we could even add methods that download files into a temporary directory and load them into memory immediately, though this would require adding pydicom as a dependency.

What do you think @fedorov ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions