-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
ref(releases): Remove DISTINCT query modifier and use EXISTS subqueries to join #95229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
❌ 2 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
Re Cursor:
This is wrong. If environments can not be fetched the resource aborts 404. Even if it didn't abort environment names are only appended to the filter_params if all the environments are returned.
This is wrong. Previous query pattern checked if the environments key existed prior to applying filters.
This is wrong. Project ids will always be populated in this context. If not an exception is raised: sentry/src/sentry/api/bases/organization.py Lines 566 to 567 in 156f296
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: API Breaks on Empty Environment Filters
The filter_releases_by_environments
function incorrectly skips filtering when environment_ids
is empty. Previously, an empty environment filter would return no results (filtering out all releases). Now, it returns unfiltered results, which is a breaking change in API behavior for the OrganizationReleasesEndpoint
and OrganizationReleasesStatsEndpoint
. This can lead to unexpected data exposure when users apply empty or invalid environment filters.
src/sentry/models/release.py#L839-L857
sentry/src/sentry/models/release.py
Lines 839 to 857 in 3f33d1f
def filter_releases_by_environments( | |
queryset: QuerySetAny, | |
project_ids: list[int], | |
environment_ids: list[int], | |
): | |
"""Return a release queryset filtered by environments.""" | |
if not environment_ids: | |
return queryset | |
return queryset.filter( | |
Exists( | |
ReleaseProjectEnvironment.objects.filter( | |
release=OuterRef("pk"), | |
environment_id__in=environment_ids, | |
project_id__in=project_ids, | |
) | |
) | |
) |
src/sentry/api/endpoints/organization_releases.py#L315-L320
sentry/src/sentry/api/endpoints/organization_releases.py
Lines 315 to 320 in 3f33d1f
queryset = Release.objects.filter(organization_id=organization.id) | |
queryset = filter_releases_by_environments( | |
queryset, | |
filter_params["project_id"], | |
[e.id for e in filter_params.get("environment_objects", [])], | |
) |
src/sentry/api/endpoints/organization_releases.py#L678-L683
sentry/src/sentry/api/endpoints/organization_releases.py
Lines 678 to 683 in 3f33d1f
queryset = filter_releases_by_projects(queryset, filter_params["project_id"]) | |
queryset = filter_releases_by_environments( | |
queryset, | |
filter_params["project_id"], | |
[e.id for e in filter_params.get("environment_objects", [])], | |
) |
Was this report helpful? Give feedback by reacting with 👍 or 👎
Problem statement:
Long running queries are being canceled due to timeouts. Somewhat amazingly DataDog does not report a cost associated with the sort and distinct nodes. I find this hard to believe so I've removed the DISTINCT modifier by using EXISTS subqueries to keep the source dataset de-duplicated. The EXISTS subquery should be a lateral move in terms of performance because both query patterns use a 'Nested Loop Join'. Removing the DISTINCT modifier should be a net benefit all things considered.
Additional Notes:
There are multiple query patterns present in this endpoint. I've document two samples below. This change always lowers the minimum cost to execute the query while mostly not changing the maximum cost. However in some cases the maximum cost is higher. This is not as clear of a win as we saw in #95135. We'll let it ride for a while and monitor its performance. If we see noticeable outliers we may revert but it would have to be in excess of the outliers we see with the current query.
Previous Query Plan Sample (test database):
New Query Plan Sample (test database):