|
1 | 1 | ---
|
2 | 2 | title: Manage Your Error Quota
|
3 | 3 | keywords: ["best practices"]
|
4 |
| -sidebar_order: 10 |
| 4 | +sidebar_order: 30 |
5 | 5 | redirect_from:
|
6 | 6 | - /guides/manage-event-stream/
|
7 | 7 | description: "Learn how to use the tools Sentry provides to control the type and amount of errors that you pay for."
|
@@ -57,127 +57,7 @@ While this grace period isn't dependent on the [Spike Protection feature](#7-spi
|
57 | 57 |
|
58 | 58 | ## Spike Protection {#7-spike-protection}
|
59 | 59 |
|
60 |
| -A spike is both a **large** and **temporary** increase in error event volume. Sentry applies a dynamic rate limit to your account designed to protect you from short-term spikes. |
61 |
| - |
62 |
| -Spike protection is enabled for your organization by default, and when it's enabled, Sentry continually monitors for spikes. You can confirm that it's enabled in **Settings > Subscription**. This page can only be accessed by a Billing or Owner member of your Sentry organization. |
63 |
| - |
64 |
| -### How Does Spike Protection Work? |
65 |
| - |
66 |
| -Because Sentry bills based on monthly event volume, spikes can easily consume your quota for the rest of the month. Sentry's spike protection prevents these types of overages from consuming your error event quota by dropping error events during the spike. |
67 |
| - |
68 |
| -We use your historical error event volume to implement a dynamic rate limit, and then discard error events when you hit its threshold. When spike protection is activated, we limit the number of error events accepted in any minute to: |
69 |
| - |
70 |
| -``` |
71 |
| -maximum(20, 6 x average error events per minute over the last 24 hours) |
72 |
| -``` |
73 |
| - |
74 |
| -<Note> |
75 |
| - |
76 |
| -The 24-hour window ends at the beginning of the current hour, not at the current minute. |
77 |
| - |
78 |
| -</Note> |
79 |
| - |
80 |
| -This means if you experience a spike, we'll temporarily protect you, but if the increase in volume is sustained, the spike protection limit will gradually **increase until Sentry accepts all events**. |
81 |
| - |
82 |
| -For example, in the last 24 hours, your organization has been receiving, on average, 10 events per minute (after any inbound filters have been applied). That means your current per-minute limit is 6 \* 10 or 60 events. There have been no spikes in that time, so spike control is "inactive". Something breaks, and in the next minute, your organization sends Sentry 100 events. When we see the 61st event, three things happen: |
83 |
| - |
84 |
| -1. Spike protection becomes "active". |
85 |
| - |
86 |
| -1. If your organization has used more than 25% of your monthly quota, Sentry sends the organization owner a [notification](/product/alerts/notifications/#quota-notifications) including which project the 61st event came from. This is likely, but not guaranteed to be, the project in which something broke. |
87 |
| - |
88 |
| -1. Events 61-100 are dropped with a 429 error code. |
89 |
| - |
90 |
| -When the next minute begins, we again record up to 60 events, dropping the rest until the minute is up. It continues like this for the remainder of the hour. After an hour, we re-calculate your per-minute limit, and for the next hour use that limit to decide whether events get dropped each minute. We repeat this process every hour until Sentry eventually accepts all events. |
91 |
| - |
92 |
| -If, instead of a single big spike (or an overall, permanent, increase in traffic), you experience many small spikes, there may be many days in a row when at least a few events are dropped. In that scenario, spike protection remains active the entire time. |
93 |
| - |
94 |
| -Spike protection is an organization-level setting, so once it's triggered, it affects all the projects in the organization, regardless of which project triggered the spike. |
95 |
| - |
96 |
| -### Managing a Spike {#-spike-protection-was-activated----what-should-i-do} |
97 |
| - |
98 |
| -If spike protection has been triggered for your account, you'll receive an email notifying you: |
99 |
| - |
100 |
| - |
101 |
| - |
102 |
| -To manage the current spike and avoid future spikes, we recommend taking the following steps: |
103 |
| - |
104 |
| -- Set up an [on-demand budget](#on-demand-budget) to ensure you have time to adjust your volume in the event of a future spike in errors. |
105 |
| -- Set better [rate limits](#6-rate-limiting) on the DSN keys for the projects related to the spike. |
106 |
| -- If it's a specific release version that has caused the spike, add the version identifier to the project's [inbound filters](#3-inbound-data-filters) to avoid accepting events from that release. |
107 |
| - |
108 |
| -To review the error events dropped because of spike protection, go to the "Usage Stats" tab of **Stats** for your Sentry org and select "Errors" in the "Category" dropdown. |
109 |
| - |
110 |
| -### When Does Spike Protection Become "Inactive?" |
111 |
| - |
112 |
| -Events will not be dropped during any minute in which you don't send more than the hourly limit that Sentry has calculated for you. After 24 hours without any dropped events, spike protection becomes "inactive" again. This means that it is no longer dropping events, but _it does not mean the system has stopped paying attention._ The next time events are dropped, spike protection will be "reactivated". |
113 |
| - |
114 |
| -### New Spike Protection Calculations |
115 |
| - |
116 |
| -<Include name="limited-avail-note.mdx" /> |
117 |
| - |
118 |
| -Limited availability spike protection is a project-level tool that helps prevent quota overconsumption. It's enabled for every project by default, and when it's enabled, Sentry continually monitors for spikes. You can confirm that it's enabled in **[Project] > Settings > General Settings**. |
119 |
| - |
120 |
| -Our spike protection algorithm does the following: |
121 |
| - |
122 |
| -- Uses a weighted average of your events over the past 168 hours (seven days) |
123 |
| -- Applies a multiplier to that number |
124 |
| -- Compares this final number against a minimum number of events, determined using your quota, to trigger a spike |
125 |
| -- Sets this as your spike limit |
126 |
| - |
127 |
| -#### Setting the Spike Limit |
128 |
| - |
129 |
| -There are two ways that we can set your spike limit, or the number of events that trigger a spike: |
130 |
| - |
131 |
| -- [Minimum Event Calculation](#minimum-event-calculation) - A calculation that determines a minimum number of events |
132 |
| -- [Usage-Based Calculation](#usage-based-calculation) - A projection based on your past usage |
133 |
| - |
134 |
| -The spike limit for each hour is set using either the minimum event or usage-based calculation — whichever is higher. This is done for a number of reasons. Firstly, using a minimum event calculation protects smaller or new projects. New projects that don't have a week’s worth of data to use to calibrate spike limits can use this minimum number of events, an adaptation of the organization’s quota, to approximate appropriate limits. Additionally, this calculation can be used to minimize false positives in smaller or new projects so that spikes aren’t flagged incorrectly. |
135 |
| - |
136 |
| -Spike limits are recalculated in real time throughout the duration of the spike to adjust for the increasing volume of incoming events. This allows the limit to grow at a steady rate such that quota is protected from being quickly consumed. [An example](#example) of how this works during a spike is shown below. |
137 |
| - |
138 |
| -##### Minimum Event Calculation |
139 |
| - |
140 |
| -This calculation, which is the first step of our algorithm, identifies a minimum number of events, using your quota as a guide. This number takes the maximum of either 500 events or the result of the following formula `(3 \* your quota)/(720 \* number of projects)`. The equation represents your project using up three times your overall quota in 30 days if events are continually ingested at this hourly rate, thus flagging the project for a potential spike. |
141 |
| - |
142 |
| -##### Usage-Based Calculation |
143 |
| - |
144 |
| -This calculation, which is the second step of our algorithm, calculates hourly data from the past seven days to determine spike limit projections for the next seven days. This data is used to calculate weighted averages, which takes into account weekly and hourly seasonality. For example, the weighted average calculated for Monday at 3 pm is more heavily influenced by data points on Monday or the hours around 3 pm. This weighted average is then multiplied by a multiplier that is `5` times the overall standard deviation of the past week — this multiplier is bounded between `3` and `6`. |
145 |
| - |
146 |
| -#### Example |
147 |
| - |
148 |
| -In this example, the project usually ingests 100-200 events per hour. There's been a spike that’s reached 50,000 events, as shown in the graph below: |
149 |
| - |
150 |
| - |
151 |
| -In the following graph, we can see a zoomed in perspective of the 12-hour period of the spike, along with a line indicating the spike limit as it’s being recalculated over the course of the spike: |
152 |
| - |
153 |
| - |
154 |
| -Throughout the spike, the recalulating limit has the following effect: |
155 |
| - |
156 |
| -- 1st hour: 6k events ingested, limit is recalculated to 2083, 3917 events dropped |
157 |
| -- 2nd hour: 34k events ingested, limit is recalculated to 2873, 31217 events dropped |
158 |
| -- 3rd hour: 55k events ingested, limit is recalculated to 5452, ~49k events dropped |
159 |
| -- 4th hour: 49k events ingested, limit is recalculated to 7628, ~41k events dropped |
160 |
| -- 5th hour: 41k events ingested, limit is recalculated to 9371, ~31k events dropped |
161 |
| - |
162 |
| -For this particular example: |
163 |
| - |
164 |
| -- Org quota: 500k |
165 |
| -- Events ingested during the spike: ~478k |
166 |
| -- Events accepted overall: ~157k |
167 |
| - |
168 |
| -Here's an example of spike limit projections for a week, taking into account seasonality: |
169 |
| - |
170 |
| - |
171 |
| - |
172 |
| -These regular differences in event ingestion don't cause a spike to occur. |
173 |
| - |
174 |
| -#### Bursty Projects |
175 |
| - |
176 |
| -There may be instances where a project routinely accepts a high volume of events in a very short period of time by design — for example projects that orchestrate cron/Airflow jobs or task runners. The screenshot below shows an example of this kind of behavior: |
177 |
| - |
178 |
| - |
179 |
| - |
180 |
| -If this is expected behavior for a given project in your organization, you may want to consider turning off spike protection in the project settings to ensure necessary events aren't dropped. |
| 60 | +Sentry's Spike Protection checks for significant overages in error events, (as compared to an established spike threshold), on a per-project basis. If a spike is detected, Spike Protection kicks in, dropping events once they've reached the spike threshold. Spike Protection can be enabled on a per-project basis for your organization by any team member with either Billing or Owner-level permissions. To select which project to set it up for, go to Settings > Spike Protection. You'll be able to toggle it on for individual projects or click “Enable All” to set it up for all your projects at once. Learn more about how Spike Protection works and how to manage spikes in [Spike Protection](/product/accounts/quotas/spike-protection/). |
181 | 61 |
|
182 | 62 | ## Adjusting Quotas
|
183 | 63 |
|
|
0 commit comments