Skip to content

Conversation

@huisman
Copy link
Contributor

@huisman huisman commented May 30, 2025

The process_map function felt a bit slow, so I rewrote parts of it to try and speed it up. Based on the profvis profiling output, I replaced the dplyr style slicing with data.table methods, but kept the output exactly the same (column order, sorting).

I used microbenchmark to runs process_map(data) 50 times and report how much less time it took (median time) to run the new version compared to the old. The resulting speed up is dependent on the data set:

  • sepsis takes ~33% less time
  • patients takes ~34% less time
  • hospital_billings takes ~70% less time (from 1525 ms to 459ms)
  • traffic_fines takes ~72% less time (from 1457ms to 395ms)

On my real life dataset containing almost 200,000 rows it takes ~30% less time to run process_map(data, frequency('absolute-case')) (from 32s to 21s). The benchmark on the real life dataset was run on a different device than the example datasets.

Rewrote part of the process_map function to slice using data.table methods,
whilst keeping the output the same.
@gertjanssenswillen gertjanssenswillen merged commit 9398ab9 into bupaverse:dev Jul 7, 2025
6 checks passed
@gertjanssenswillen
Copy link
Member

Thanks for the updates. I'm doing some checks on dev, merge with master and publish on CRAN. (also edeaR later).
As a token of appreciation, I can add you as a contributor with your github username, or if your actual name if you let me know.

@huisman
Copy link
Contributor Author

huisman commented Jul 8, 2025

Thanks for the updates. I'm doing some checks on dev, merge with master and publish on CRAN. (also edeaR later). As a token of appreciation, I can add you as a contributor with your github username, or if your actual name if you let me know.

Thanks, no need to add me as a contributor, getting these changed merged is appreciation enough :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants