Increase SQL Query Histogram Range Beyond 10ms for Latency Tracking

The **current 10 ms upper bound** on GoFr’s SQL query histogram quantiles is great for **ultra-fast** query visibility (for apps serving millions of requests per minute), but it **compresses all slower queries into the same top bucket**, which makes real-world latency analysis difficult — especially in production workloads where 100 ms–5 s queries are normal for analytical or transactional DBs.

---

### ⚙️ **Why 10 ms was used initially**

* It was designed to **detect micro-latency regressions** in high-throughput services (like internal GoFr benchmarks).
* Early use cases focused on microservices hitting **in-memory or well-indexed SQL queries** where anything >10 ms was an anomaly.
* It optimized for **Prometheus cardinality** — fewer histogram buckets = smaller metrics footprint.

---

### ❌ **Why that’s limiting in real workloads**

* Queries taking **20 ms, 200 ms, or 2 s** all fall into the same bucket — you lose resolution.
* Makes it **impossible to differentiate between “slightly slow” and “critical” queries**.
* You can’t correlate query latency with **API request latency** effectively.
* Makes alerting thresholds (like P95 or P99 query latency) **misleadingly small**.

---

### ✅ **Recommended Fix**

Adopt **a wider histogram range** with **logarithmic or percentile-style bucket spacing**.

Example ideal bucket set for SQL histograms:

```go
[]float64{
    0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 20, 30, 60,
}
```

This covers:

* Microsecond-level detail for fast queries
* Smooth spread for mid-range queries
* Full visibility up to **60 seconds** for extreme cases

---

### 🧩 **Dynamic Adjustment (Could be an option)**

Instead of hardcoding buckets, expose them via configuration — for example:

```go
app.Metrics.SetHistogramBuckets("sql_query_duration_seconds", []float64{
    0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10, 30, 60,
})
```

You could tie this to:

* **Datasource type** (MySQL, Postgres, BigQuery, etc.)
* **Request timeout** (e.g., if `REQUEST_TIMEOUT=30s`, upper bound could be 30 s)
* **Environment** (dev → short, prod → wide)

---

### 📊 **Impact**

* Slightly higher Prometheus storage footprint (more buckets)
* Far more actionable insights:

  * 95th percentile query time per DB
  * Breakdown of fast vs slow queries
  * Easier detection of query regressions after deployments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase SQL Query Histogram Range Beyond 10ms for Latency Tracking #2467

⚙️ Why 10 ms was used initially

❌ Why that’s limiting in real workloads

✅ Recommended Fix

🧩 Dynamic Adjustment (Could be an option)

📊 Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Increase SQL Query Histogram Range Beyond 10ms for Latency Tracking #2467

Description

⚙️ Why 10 ms was used initially

❌ Why that’s limiting in real workloads

✅ Recommended Fix

🧩 Dynamic Adjustment (Could be an option)

📊 Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions