Skip to content

Commit 89ae114

Browse files
committed
chore(clustered): Component scaling recommendations:
Add suggestions from @reidkaufmann in influxdata/DAR#472
1 parent 8491969 commit 89ae114

File tree

2 files changed

+67
-42
lines changed

2 files changed

+67
-42
lines changed

content/influxdb3/clustered/admin/scale-cluster.md

Lines changed: 50 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -466,19 +466,29 @@ helm upgrade \
466466

467467
### Router
468468

469-
The Router can be scaled both [vertically](#vertical-scaling) and
469+
The [Router](/influxdb3/clustered/reference/internals/storage-engine/#router) can be scaled both [vertically](#vertical-scaling) and
470470
[horizontally](#horizontal-scaling).
471-
Horizontal scaling increases write throughput and is typically the most
471+
472+
- **Recommended**: Horizontal scaling increases write throughput and is typically the most
472473
effective scaling strategy for the Router.
473-
Vertical scaling (specifically increased CPU) improves the Router's ability to
474+
- Vertical scaling (specifically increased CPU) improves the Router's ability to
474475
parse incoming line protocol with lower latency.
475476

477+
#### Router latency
478+
479+
Latency of the Router’s write endpoint is directly impacted by:
480+
481+
- Ingester latency--the router calls the Ingester during a client write request
482+
- Catalog latency during schema validation
483+
476484
### Ingester
477485

478-
The Ingester can be scaled both [vertically](#vertical-scaling) and
486+
The [Ingester](/influxdb3/clustered/reference/internals/storage-engine/#ingester) can be scaled both [vertically](#vertical-scaling) and
479487
[horizontally](#horizontal-scaling).
480-
Vertical scaling increases write throughput and is typically the most effective
481-
scaling strategy for the Ingester.
488+
489+
- **Recommended**: Vertical scaling is typically the most effective scaling strategy for the Ingester.
490+
Compared to horizontal scaling, vertical scaling not only increases write throughput but also lessens query, catalog, and compaction overheads as well as Object store costs.
491+
- Horizontal scaling can help distribute write load but comes with additional coordination overhead.
482492

483493
#### Ingester storage volume
484494

@@ -541,50 +551,62 @@ ingesterStorage:
541551

542552
### Querier
543553

544-
The Querier can be scaled both [vertically](#vertical-scaling) and
554+
The [Querier](/influxdb3/clustered/reference/internals/storage-engine/#querier) can be scaled both [vertically](#vertical-scaling) and
545555
[horizontally](#horizontal-scaling).
546-
Horizontal scaling increases query throughput to handle more concurrent queries.
547-
Vertical scaling improves the Querier’s ability to process computationally
548-
intensive queries.
556+
557+
- **Recommended**: [Vertical scaling](#vertical-scaling) improves the Querier's ability to process concurrent or computationally
558+
intensive queries, and increases the effective cache capacity.
559+
- Horizontal scaling increases query throughput to handle more concurrent queries.
560+
Consider horizontal scaling if vertical scaling doesn't adequately address
561+
concurrency demands or reaches the hardware limits of your underlying nodes.
549562

550563
### Compactor
551564

552-
The Compactor can be scaled both [vertically](#vertical-scaling) and
553-
[horizontally](#horizontal-scaling).
554-
Because compaction is a compute-heavy process, vertical scaling (especially
555-
increasing the available CPU) is the most effective scaling strategy for the
556-
Compactor. Horizontal scaling increases compaction throughput, but not as
565+
- **Recommended**: Maintain **1 Compactor pod** and use [vertical scaling](#vertical-scaling) (especially
566+
increasing the available CPU) for the Compactor.
567+
- Because compaction is a compute-heavy process, horizontal scaling increases compaction throughput, but not as
557568
efficiently as vertical scaling.
558569

559570
### Garbage collector
560571

561-
The Garbage collector is not designed for distributed load and should _not_ be
562-
scaled horizontally. It is a lightweight process that typically doesn't require
563-
significant system resources. [Vertical scaling](#vertical-scaling) should only
564-
be considered if you observe consistently high CPU usage or if the container
572+
The [Garbage collector](/influxdb3/clustered/reference/internals/storage-engine/#garbage-collector) is a lightweight process that typically doesn't require
573+
significant system resources.
574+
575+
- Don't horizontally scale the Garbage collector; it isn't designed for distributed load.
576+
- Consider [vertical scaling](#vertical-scaling) only if you observe consistently high CPU usage or if the container
565577
regularly runs out of memory.
566578

567579
### Catalog store
568580

569-
The Catalog store is a PostgreSQL-compatible database that persistently stores metadata.
570-
Scaling strategies depend on your chosen PostgreSQL implementation.
571-
All support [vertical scaling](#vertical-scaling), and most support
572-
[horizontal scaling](#horizontal-scaling) for redundancy and failover.
581+
The [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store) is a PostgreSQL-compatible database that stores critical metadata for your InfluxDB cluster.
582+
An underprovisioned Catalog store can cause write outages and system-wide performance issues.
583+
584+
- Scaling strategies depend on your specific PostgreSQL implementation
585+
- All PostgreSQL implementations support [vertical scaling](#vertical-scaling)
586+
- Most implementations support [horizontal scaling](#horizontal-scaling) for improved redundancy and failover
587+
573588

574589
### Catalog service
575590

576-
The Catalog service should maintain exactly
577-
3 replicas for optimal redundancy.
578-
Additional replicas are discouraged; favor vertical scaling instead if performance improvements are needed.
591+
The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) (iox-shared-catalog statefulset) caches
592+
and manages access to the Catalog store.
593+
594+
- **Recommended**: Maintain **exactly 3 replicas** of the Catalog service for optimal redundancy. Additional replicas are discouraged.
595+
- If performance improvements are needed, use [vertical scaling](#vertical-scaling).
579596

580597
> [!Note]
598+
> #### Managing Catalog components
599+
>
581600
> The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) is managed through the
582601
> `AppInstance` resource, while the [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store)
583602
> is managed separately according to your PostgreSQL implementation.
584603

585604
### Object store
586605

587-
Scaling strategies available for the Object store depend on the underlying
588-
object storage services used to run the object store. Most support
606+
The [Object store](/influxdb3/clustered/reference/internals/storage-engine/#object-store)
607+
contains time series data in Parquet format.
608+
609+
Scaling strategies depend on the underlying object storage services used.
610+
Most services support
589611
[horizontal scaling](#horizontal-scaling) for redundancy, failover, and
590612
increased capacity.

content/influxdb3/clustered/reference/internals/storage-engine.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -50,20 +50,20 @@ queries, and is optimized to reduce storage cost.
5050

5151
The Router (also known as the Ingest Router) parses incoming line
5252
protocol and then routes it to [Ingesters](#ingester).
53-
To ensure write durability, the Router replicates data to two or more of the
54-
available Ingesters.
53+
The Router processes incoming write requests through the following steps:
54+
55+
- Queries the [Catalog](#catalog) to determine persistence locations and verify schema compatibility
56+
- Validates syntax and schema compatibility for each data point in the request,
57+
and either accepts or [rejects points](/influxdb3/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
58+
- Returns a [response](/influxdb3/clustered/write-data/troubleshoot/) to the client
59+
- Replicates data to two or more available Ingesters for write durability
5560

5661
### Ingester
5762

5863
The Ingester processes line protocol submitted in write requests and persists
5964
time series data to the [Object store](#object-store).
6065
In this process, the Ingester does the following:
6166

62-
- Queries the [Catalog](#catalog) to identify where data should be persisted and
63-
to ensure the schema of the line protocol is compatible with the
64-
[schema](/influxdb3/clustered/reference/glossary/#schema) of persisted data.
65-
- Accepts or [rejects](/influxdb3/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
66-
points in the write request and generates a [response](/influxdb3/clustered/write-data/troubleshoot/).
6767
- Processes line protocol and persists time series data to the
6868
[Object store](#object-store) in Apache Parquet format. Each Parquet file
6969
represents a _partition_--a logical grouping of data.
@@ -93,18 +93,21 @@ At query time, the querier:
9393
3. Queries the [Catalog service](#catalog-service) to retrieve [Catalog store](#catalog-store)
9494
information about partitions in the [Object store](#object-store)
9595
that contain the queried data.
96-
4. Reads partition Parquet files that contain the queried data and scans each
96+
4. Retrieves any needed Parquet files (not already cached) from the Object store.
97+
5. Reads partition Parquet files that contain the queried data and scans each
9798
row to filter data that matches predicates in the query plan.
98-
5. Performs any additional operations (for example: deduplicating, merging, and sorting)
99-
specified in the query plan.
100-
6. Returns the query result to the client.
99+
6. Performs any additional operations (for example: deduplicating, merging, and sorting)
100+
specified in the query plan.
101+
7. Returns the query result to the client.
101102

102103
### Catalog
103104

104105
InfluxDB's catalog system consists of two distinct components: the [Catalog store](#catalog-store)
105106
and the [Catalog service](#catalog-service).
106107

107108
> [!Note]
109+
> #### Managing Catalog components
110+
>
108111
> The Catalog service is managed through the `AppInstance` resource, while the Catalog store
109112
> is managed separately according to your PostgreSQL implementation.
110113
@@ -127,10 +130,10 @@ and manages access to the Catalog store.
127130
### Object store
128131

129132
The Object store contains time series data in [Apache Parquet](https://parquet.apache.org/) format.
130-
Each Parquet file represents a partition.
131-
By default, InfluxDB partitions tables by day, but you can
132-
[customize the partitioning strategy](/influxdb3/clustered/admin/custom-partitions/).
133133
Data in each Parquet file is sorted, encoded, and compressed.
134+
A partition may contain multiple parquet files which are subject to compaction.
135+
By default, InfluxDB partitions tables by day, but you can
136+
[customize the partitioning strategy](/influxdb3/clustered/admin/custom-partitions/)
134137

135138
### Compactor
136139

0 commit comments

Comments
 (0)