|
1 | 1 | ---
|
2 | 2 | title: KDE DPC++ example
|
3 | 3 | description: KDE (kernel density estimation) example using SYCL and DPC++.
|
| 4 | +date: 2024-07-02 |
4 | 5 | weight: 2
|
5 | 6 | ---
|
6 | 7 |
|
7 |
| -{{% pageinfo %}} |
8 |
| - |
9 |
| -These basic sample guidelines assume that your Docsy site is deployed using Netlify and your files are stored in GitHub. You can use the guidelines "as is" or adapt them with your own instructions: for example, other deployment options, information about your doc project's file structure, project-specific review guidelines, versioning guidelines, or any other information your users might find useful when updating your site. [Kubeflow](https://github.com/kubeflow/website/blob/master/README.md) has a great example. |
10 |
| - |
11 |
| -Don't forget to link to your own doc repo rather than our example site! Also make sure users can find these guidelines from your doc repo README: either add them there and link to them from this page, add them here and link to them from the README, or include them in both locations. |
12 |
| - |
13 |
| -{{% /pageinfo %}} |
14 |
| - |
15 |
| -We use [Hugo](https://gohugo.io/) to format and generate our website, the |
16 |
| -[Docsy](https://github.com/google/docsy) theme for styling and site structure, |
17 |
| -and [Netlify](https://www.netlify.com/) to manage the deployment of the site. |
18 |
| -Hugo is an open-source static site generator that provides us with templates, |
19 |
| -content organisation in a standard directory structure, and a website generation |
20 |
| -engine. You write the pages in Markdown (or HTML if you want), and Hugo wraps them up into a website. |
21 |
| - |
22 |
| -All submissions, including submissions by project members, require review. We |
23 |
| -use GitHub pull requests for this purpose. Consult |
24 |
| -[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more |
25 |
| -information on using pull requests. |
26 |
| - |
27 |
| -## Quick start with Netlify |
28 |
| - |
29 |
| -Here's a quick guide to updating the docs. It assumes you're familiar with the |
30 |
| -GitHub workflow and you're happy to use the automated preview of your doc |
31 |
| -updates: |
32 |
| - |
33 |
| -1. Fork the [Goldydocs repo](https://github.com/google/docsy-example) on GitHub. |
34 |
| -1. Make your changes and send a pull request (PR). |
35 |
| -1. If you're not yet ready for a review, add "WIP" to the PR name to indicate |
36 |
| - it's a work in progress. (**Don't** add the Hugo property |
37 |
| - "draft = true" to the page front matter, because that prevents the |
38 |
| - auto-deployment of the content preview described in the next point.) |
39 |
| -1. Wait for the automated PR workflow to do some checks. When it's ready, |
40 |
| - you should see a comment like this: **deploy/netlify — Deploy preview ready!** |
41 |
| -1. Click **Details** to the right of "Deploy preview ready" to see a preview |
42 |
| - of your updates. |
43 |
| -1. Continue updating your doc and pushing your changes until you're happy with |
44 |
| - the content. |
45 |
| -1. When you're ready for a review, add a comment to the PR, and remove any |
46 |
| - "WIP" markers. |
47 |
| - |
48 |
| -## Updating a single page |
49 |
| - |
50 |
| -If you've just spotted something you'd like to change while using the docs, Docsy has a shortcut for you: |
51 |
| - |
52 |
| -1. Click **Edit this page** in the top right hand corner of the page. |
53 |
| -1. If you don't already have an up to date fork of the project repo, you are prompted to get one - click **Fork this repository and propose changes** or **Update your Fork** to get an up to date version of the project to edit. The appropriate page in your fork is displayed in edit mode. |
54 |
| -1. Follow the rest of the [Quick start with Netlify](#quick-start-with-netlify) process above to make, preview, and propose your changes. |
55 |
| - |
56 |
| -## Previewing your changes locally |
57 |
| - |
58 |
| -If you want to run your own local Hugo server to preview your changes as you work: |
59 |
| - |
60 |
| -1. Follow the instructions in [Getting started](/docs/getting-started) to install Hugo and any other tools you need. You'll need at least **Hugo version 0.45** (we recommend using the most recent available version), and it must be the **extended** version, which supports SCSS. |
61 |
| -1. Fork the [Goldydocs repo](https://github.com/google/docsy-example) repo into your own project, then create a local copy using `git clone`. Don’t forget to use `--recurse-submodules` or you won’t pull down some of the code you need to generate a working site. |
62 |
| - |
63 |
| - ``` |
64 |
| - git clone --recurse-submodules --depth 1 https://github.com/google/docsy-example.git |
65 |
| - ``` |
66 |
| -
|
67 |
| -1. Run `hugo server` in the site root directory. By default your site will be available at http://localhost:1313/. Now that you're serving your site locally, Hugo will watch for changes to the content and automatically refresh your site. |
68 |
| -1. Continue with the usual GitHub workflow to edit files, commit them, push the |
69 |
| - changes up to your fork, and create a pull request. |
70 |
| -
|
71 |
| -## Creating an issue |
72 |
| -
|
73 |
| -If you've found a problem in the docs, but you're not sure how to fix it yourself, please create an issue in the [Goldydocs repo](https://github.com/google/docsy-example/issues). You can also create an issue about a specific page by clicking the **Create Issue** button in the top right hand corner of the page. |
74 |
| -
|
75 |
| -## Useful resources |
76 |
| -
|
77 |
| -* [Docsy user guide](https://www.docsy.dev/docs/): All about Docsy, including how it manages navigation, look and feel, and multi-language support. |
78 |
| -* [Hugo documentation](https://gohugo.io/documentation/): Comprehensive reference for Hugo. |
79 |
| -* [Github Hello World!](https://guides.github.com/activities/hello-world/): A basic introduction to GitHub concepts and workflow. |
80 |
| -
|
81 |
| -
|
| 8 | +Given a sample of \\(n\\) observations \\(x_i\\) drawn from an unknown underlying continuous distribution \\(f(x)\\), |
| 9 | +the kernel density estimate of that density function is computed as follows, for some kernel |
| 10 | +smoothing parameter \\(h \in \mathbb{R}\\): |
| 11 | + |
| 12 | +$$ |
| 13 | + \hat{f}(x) = \frac{1}{n} \sum_{i=1}^{n} \frac{1}{h} K\left(\frac{x - x_i}{h}\right) |
| 14 | +$$ |
| 15 | + |
| 16 | +An example of NumPy code performing the estimation, for a common choice of kernel function as standard |
| 17 | +\\(d\\)-dimensional Gaussian distribution: |
| 18 | + |
| 19 | +<!-- See https://stackoverflow.com/questions/5319754/cross-reference-named-anchor-in-markdown //--> |
| 20 | +<a id="kde_numpy" href=""></a> |
| 21 | +```python |
| 22 | +def kde(poi : np.ndarray, sample : np.ndarray, h : float) -> np.ndarray: |
| 23 | + """Given a sample from underlying continuous distribution and |
| 24 | + a smoothing parameter `h`, evaluate density estimate at each point of |
| 25 | + interest `poi`. |
| 26 | + """ |
| 27 | + assert sample.ndim == 2 |
| 28 | + assert poi.ndim == 2 |
| 29 | + m, d1 = poi.shape |
| 30 | + n, d2 = sample.shape |
| 31 | + assert d1 == d2 |
| 32 | + assert h > 0 |
| 33 | + dm = np.sum(np.square(poi[:, np.newaxis, ...] - sample[np.newaxis, ...]), axis=-1) |
| 34 | + return np.mean(np.exp(dm/(-2*h*h)), axis=-1)/np.power(np.sqrt(2*np.pi) * h, d1) |
| 35 | +``` |
| 36 | + |
| 37 | +The code above evaluates \\(f(x)\\) for \\(m\\) values of points of interest \\(y_t\\). |
| 38 | + |
| 39 | +$$ |
| 40 | + f(y_t) = \frac{1}{n} \sum_{i=1}^{n} \frac{1}{h} K\left( \frac{1}{h^2} \left\lVert y_t - x_i \right\rVert^{2} \right), \;\;\; \forall 0 \leq t \le m |
| 41 | +$$ |
| 42 | + |
| 43 | +Evaluating such an expression can be done in parallel. Evaluation can be done independently for each \\(t\\). |
| 44 | +Furthermore, summation over \\(i\\) can be partitioned among work-items, each summing \\(n_{wi}\\) distinct terms. |
| 45 | +Such work partitioning would generate \\(m \cdot \left\lceil {n}/{n_{wi}}\right\rceil\\) independent tasks. |
| 46 | +Each work-item could write its partial sum into a dedicated temporary memory location to avoid race condition |
| 47 | +for further summation by another kernel operating in a similar fashion. |
| 48 | + |
| 49 | +```cpp |
| 50 | + parallel_for( |
| 51 | + range<2>(m, ((n + n_wi - 1) / n_wi)), |
| 52 | + [=](sycl::item<2> it) { |
| 53 | + auto t = it.get_id(0); |
| 54 | + auto i_block = it.get_id(1); |
| 55 | + |
| 56 | + T local_partial_sum = ...; |
| 57 | + |
| 58 | + partial_sums[t * ((n + n_wi - 1) / n_wi) + i_block] = local_partial_sum; |
| 59 | + } |
| 60 | + ); |
| 61 | +``` |
| 62 | +
|
| 63 | +Such an approach, known as tree reduction, is implemented in ``kernel_density_esimation_temps`` function found in |
| 64 | +``"steps/kernel_density_estimation_cpp/kde.hpp"``. |
| 65 | +
|
| 66 | +Use of temporary allocation can be avoided if each work-item atomically adds the value of the local sum to the |
| 67 | +appropriate zero-initialized location in the output array, as in implementation ``kernel_density_estimation_atomic_ref`` |
| 68 | +in the same header file: |
| 69 | +
|
| 70 | +```cpp |
| 71 | + parallel_for( |
| 72 | + range<2>(m, ((n + n_wi - 1) / n_wi)), |
| 73 | + [=](sycl::item<2> it) { |
| 74 | + auto t = it.get_id(0); |
| 75 | + auto i_block = it.get_id(1); |
| 76 | +
|
| 77 | + T local_partial_sum = ...; |
| 78 | +
|
| 79 | + sycl::atomic_ref<...> f_aref(f[t]); |
| 80 | + f_aref += local_partial_sum; |
| 81 | + } |
| 82 | + ); |
| 83 | +``` |
| 84 | + |
| 85 | +Multiple work-items may concurrently updating the same location in global memory would produce the correct result due to |
| 86 | +use of ``sycl::atomic_ref`` but at the expense of increased number of attempts, phenomenon known as atomic pressure. |
| 87 | +Atomic pressure leads to thread divergence and degrades performance. |
| 88 | + |
| 89 | +To reduce the atomic pressure work-items can be organized into work-groups. Every work-item in a work-group has access |
| 90 | +to local shared memory, dedicated on-chip memory, which can be used to cooperatively combine values held by work-items |
| 91 | +in the work-group without accessing the global memory. This could be done efficiently by calling group function |
| 92 | +``sycl::reduce_over_group``. To be able to call it, we must specify iteration range using ``sycl::nd_range`` rather than |
| 93 | +``sycl::range`` as we did earlier. |
| 94 | + |
| 95 | +```cpp |
| 96 | + auto wg = 256; // work-group-size |
| 97 | + auto n_data_per_wg = n_wi * wg; |
| 98 | + auto n_groups = ((n + n_data_per_wg - 1) / n_data_per_Wg); |
| 99 | + |
| 100 | + range<2> gRange(m, n_groups * wg); |
| 101 | + range<2> lRange(1, wg); |
| 102 | + |
| 103 | + parallel_for( |
| 104 | + nd_range<2>(gRange, lRange), |
| 105 | + [=](sycl::nd_item<2> it) { |
| 106 | + auto t = it.get_global_id(0); |
| 107 | + |
| 108 | + T local_partial_sum = ...; |
| 109 | + |
| 110 | + auto work_group = it.get_group(); |
| 111 | + T sum_over_wg = sycl::reduce_over_group(work_group, local_sum, sycl::plus<>()); |
| 112 | + |
| 113 | + if (work_group.leader()) { |
| 114 | + sycl::atomic_ref<...> f_aref(f[t]); |
| 115 | + f_aref += sum_over_wg; |
| 116 | + } |
| 117 | + } |
| 118 | + ); |
| 119 | +``` |
| 120 | +
|
| 121 | +Complete implementation can be found in ``kernel_density_estimation_work_group_reduce_and_atomic_ref`` function |
| 122 | +in ``"steps/kernel_density_estimation_cpp/kde.hpp"``. |
| 123 | +
|
| 124 | +These implementations are called from C++ application ``"steps/kernel_density_estimation_cpp/app.cpp"``, which |
| 125 | +samples data uniformly distributed over unit cuboid, and estimates the density using Kernel Density Estimation |
| 126 | +and spherically symmetric multivariate Gaussian probability density function as the kernel. |
| 127 | +
|
| 128 | +The application can be built using `CMake`, or `Meson`, please refer to [README](steps/kernel_density_estimation_cpp/README.md) document in that folder. |
0 commit comments