Speed up process_map function by using data.table #68

huisman · 2025-05-30T12:14:55Z

The process_map function felt a bit slow, so I rewrote parts of it to try and speed it up. Based on the profvis profiling output, I replaced the dplyr style slicing with data.table methods, but kept the output exactly the same (column order, sorting).

I used microbenchmark to runs process_map(data) 50 times and report how much less time it took (median time) to run the new version compared to the old. The resulting speed up is dependent on the data set:

sepsis takes ~33% less time
patients takes ~34% less time
hospital_billings takes ~70% less time (from 1525 ms to 459ms)
traffic_fines takes ~72% less time (from 1457ms to 395ms)

On my real life dataset containing almost 200,000 rows it takes ~30% less time to run process_map(data, frequency('absolute-case')) (from 32s to 21s). The benchmark on the real life dataset was run on a different device than the example datasets.

Rewrote part of the process_map function to slice using data.table methods, whilst keeping the output the same.

gertjanssenswillen · 2025-07-07T09:12:11Z

Thanks for the updates. I'm doing some checks on dev, merge with master and publish on CRAN. (also edeaR later).
As a token of appreciation, I can add you as a contributor with your github username, or if your actual name if you let me know.

huisman · 2025-07-08T19:22:08Z

Thanks for the updates. I'm doing some checks on dev, merge with master and publish on CRAN. (also edeaR later). As a token of appreciation, I can add you as a contributor with your github username, or if your actual name if you let me know.

Thanks, no need to add me as a contributor, getting these changed merged is appreciation enough :)

Speed up process_map function by using data.table

de545ce

Rewrote part of the process_map function to slice using data.table methods, whilst keeping the output the same.

gertjanssenswillen merged commit 9398ab9 into bupaverse:dev Jul 7, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Speed up process_map function by using data.table #68

Speed up process_map function by using data.table #68

Uh oh!

huisman commented May 30, 2025

Uh oh!

Uh oh!

gertjanssenswillen commented Jul 7, 2025

Uh oh!

huisman commented Jul 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Speed up process_map function by using data.table #68

Speed up process_map function by using data.table #68

Uh oh!

Conversation

huisman commented May 30, 2025

Uh oh!

Uh oh!

gertjanssenswillen commented Jul 7, 2025

Uh oh!

huisman commented Jul 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants