Skip to content

Commit 271b0a2

Browse files
authored
Merge pull request #5885 from influxdata/jts/dar-472-catalog-terminology
fix(clustered): Closes influxdata/DAR#472. …
2 parents 8cada04 + c37e50e commit 271b0a2

File tree

9 files changed

+143
-154
lines changed

9 files changed

+143
-154
lines changed

content/influxdb3/clustered/admin/backup-restore.md

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -12,29 +12,29 @@ weight: 105
1212
influxdb3/clustered/tags: [backup, restore]
1313
---
1414

15-
InfluxDB Clustered automatically stores snapshots of the InfluxDB Catalog that
15+
InfluxDB Clustered automatically stores snapshots of the InfluxDB Catalog store that
1616
you can use to restore your cluster to a previous state. The snapshotting
1717
functionality is optional and is disabled by default.
1818
Enable snapshots to ensure you can recover
1919
in case of emergency.
2020

2121
With InfluxDB Clustered snapshots enabled, each hour, InfluxDB uses the `pg_dump`
22-
utility included with the InfluxDB Garbage Collector to export an SQL blob or
23-
“snapshot” from the InfluxDB Catalog and store it in the object store.
24-
The Catalog is a PostgreSQL-compatible relational database that stores metadata
22+
utility included with the InfluxDB Garbage collector to export an SQL blob or
23+
“snapshot” from the InfluxDB Catalog store to the Object store.
24+
The Catalog store is a PostgreSQL-compatible relational database that stores metadata
2525
for your time series data, such as schema data types, Parquet file locations, and more.
2626

27-
The Catalog snapshots act as recovery points for your InfluxDB cluster that
28-
reference all Parquet files that existed in the object store at the time of the
29-
snapshot. When a snapshot is restored to the Catalog, the Compactor
27+
The Catalog store snapshots act as recovery points for your InfluxDB cluster that
28+
reference all Parquet files that existed in the Object store at the time of the
29+
snapshot. When a snapshot is restored to the Catalog store, the Compactor
3030
[soft deletes](#soft-delete)” any Parquet files not listed in the snapshot.
3131

3232
> [!Note]
3333
> InfluxDB won't [hard delete](#hard-delete) Parquet files listed in _any_ hourly or daily snapshot.
3434
>
3535
> For example, if you have Parquet files A, B, C, and D, and you restore to a
3636
> snapshot that includes B and C, but not A and D, then A and D are soft-deleted, but remain in object
37-
> storage until they are no longer referenced in any Catalog snapshot.
37+
> storage until they are no longer referenced in any Catalog store snapshot.
3838
- [Soft delete](#soft-delete)
3939
- [Hard delete](#hard-delete)
4040
- [Recovery Point Objective (RPO)](#recovery-point-objective-rpo)
@@ -75,8 +75,8 @@ The InfluxDB Clustered snapshot strategy RPO allows for the following maximum da
7575
## Recovery Time Objective (RTO)
7676

7777
RTO is the maximum amount of downtime allowed for an InfluxDB cluster after a failure.
78-
RTO varies depending on the size of your Catalog database, network speeds
79-
between the client machine and the Catalog database, cluster load, the status
78+
RTO varies depending on the size of your Catalog store, network speeds
79+
between the client machine and the Catalog store, cluster load, the status
8080
of your underlying hosting provider, and other factors.
8181

8282
## Data written just before a snapshot may not be present after restoring
@@ -94,14 +94,14 @@ present after restoring to that snapshot.
9494
### Automate object synchronization to an external S3-compatible bucket
9595

9696
Syncing objects to an external S3-compatible bucket ensures an up-to-date backup
97-
in case your object store becomes unavailable. Recovery point snapshots only
98-
back up the InfluxDB Catalog. If data referenced in a Catalog snapshot does not
99-
exist in the object store, the recovery process does not restore the missing data.
97+
in case your Object store becomes unavailable. Recovery point snapshots only
98+
back up the InfluxDB Catalog store. If data referenced in a Catalog store snapshot does not
99+
exist in the Object store, the recovery process does not restore the missing data.
100100

101101
### Enable short-term object versioning
102102

103103
If your object storage provider supports it, consider enabling short-term
104-
object versioning on your object store--for example, 1-2 days to protect against errant writes or deleted objects.
104+
object versioning on your Object store--for example, 1-2 days to protect against errant writes or deleted objects.
105105
With object versioning enabled, as objects are updated, the object store
106106
retains distinct versions of each update that can be used to “rollback” newly
107107
written or updated Parquet files to previous versions.
@@ -140,7 +140,7 @@ spec:
140140
141141
#### INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES
142142
143-
Enable hourly Catalog snapshotting. The default is `'false'`. Set to `'true'`:
143+
Enable hourly Catalog store snapshotting. The default is `'false'`. Set to `'true'`:
144144
145145
```yaml
146146
INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES: 'true'
@@ -217,22 +217,20 @@ written on or around the beginning of the next hour.
217217
## Restore to a recovery point
218218

219219
Use the following process to restore your InfluxDB cluster to a recovery point
220-
using Catalog snapshots:
220+
using Catalog store snapshots:
221221

222222
1. **Install prerequisites:**
223223

224224
- `kubectl` CLI for managing your Kubernetes deployment.
225-
- `psql` CLI to interact with the PostgreSQL-compatible Catalog database with
226-
the appropriate Data Source Name (DSN) and connection credentials.
227-
- A client to interact with your InfluxDB cluster’s object store.
228-
Supported clients depend on your object storage provider.
225+
- `psql` CLI configured with your Data Source Name and credentials for interacting with the PostgreSQL-compatible Catalog store database.
226+
- A client from your object storage provider for interacting with your InfluxDB cluster's Object store.
229227

230228
2. **Retrieve the recovery point snapshot from your object store.**
231229

232230
InfluxDB Clustered stores hourly and daily snapshots in the
233231
`/catalog_backup_file_lists` path in object storage. Download the snapshot
234-
that you would like to use as the recovery point. If your primary object
235-
store is unavailable, download the snapshot from your replicated object store.
232+
that you would like to use as the recovery point. If your primary Object
233+
store is unavailable, download the snapshot from your replicated Object store.
236234

237235
> [!Important]
238236
> When creating and storing a snapshot, the last artifact created is the

content/influxdb3/clustered/admin/scale-cluster.md

Lines changed: 72 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -22,19 +22,12 @@ resources available to each component.
2222
- [Scaling strategies](#scaling-strategies)
2323
- [Vertical scaling](#vertical-scaling)
2424
- [Horizontal scaling](#horizontal-scaling)
25+
- [Scale your cluster as a whole](#scale-your-cluster-as-a-whole)
2526
- [Scale components in your cluster](#scale-components-in-your-cluster)
2627
- [Horizontally scale a component](#horizontally-scale-a-component)
2728
- [Vertically scale a component](#vertically-scale-a-component)
2829
- [Apply your changes](#apply-your-changes)
29-
- [Scale your cluster as a whole](#scale-your-cluster-as-a-whole)
3030
- [Recommended scaling strategies per component](#recommended-scaling-strategies-per-component)
31-
- [Ingester](#ingester)
32-
- [Querier](#querier)
33-
- [Router](#router)
34-
- [Compactor](#compactor)
35-
- [Garbage collector](#garbage-collector)
36-
- [Catalog](#catalog)
37-
- [Object store](#object-store)
3831

3932
## Scaling strategies
4033

@@ -59,6 +52,14 @@ throughput a system can manage, but also provides additional redundancy and fail
5952

6053
{{< html-diagram/scaling-strategy "horizontal" >}}
6154

55+
## Scale your cluster as a whole
56+
57+
Scaling your entire InfluxDB Cluster is done by scaling your Kubernetes cluster
58+
and is managed outside of InfluxDB. The process of scaling your entire Kubernetes
59+
cluster depends on your underlying Kubernetes provider. You can also use
60+
[Kubernetes autoscaling](https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/)
61+
to automatically scale your cluster as needed.
62+
6263
## Scale components in your cluster
6364

6465
The following components of your InfluxDB cluster are scaled by modifying
@@ -69,6 +70,7 @@ properties in your `AppInstance` resource:
6970
- Compactor
7071
- Router
7172
- Garbage collector
73+
- Catalog service
7274

7375
> [!Note]
7476
> #### Scale your Catalog and Object store
@@ -448,39 +450,42 @@ helm upgrade \
448450
{{% /code-tab-content %}}
449451
{{< /code-tabs-wrapper >}}
450452

451-
## Scale your cluster as a whole
452-
453-
Scaling your entire InfluxDB Cluster is done by scaling your Kubernetes cluster
454-
and is managed outside of InfluxDB. The process of scaling your entire Kubernetes
455-
cluster depends on your underlying Kubernetes provider. You can also use
456-
[Kubernetes autoscaling](https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/)
457-
to automatically scale your cluster as needed.
458-
459453
## Recommended scaling strategies per component
460454

461455
- [Router](#router)
462456
- [Ingester](#ingester)
463457
- [Querier](#querier)
464458
- [Compactor](#compactor)
465459
- [Garbage collector](#garbage-collector)
466-
- [Catalog](#catalog)
460+
- [Catalog store](#catalog-store)
461+
- [Catalog service](#catalog-service)
467462
- [Object store](#object-store)
468463

469464
### Router
470465

471-
The Router can be scaled both [vertically](#vertical-scaling) and
466+
The [Router](/influxdb3/clustered/reference/internals/storage-engine/#router) can be scaled both [vertically](#vertical-scaling) and
472467
[horizontally](#horizontal-scaling).
473-
Horizontal scaling increases write throughput and is typically the most
468+
469+
- **Recommended**: Horizontal scaling increases write throughput and is typically the most
474470
effective scaling strategy for the Router.
475-
Vertical scaling (specifically increased CPU) improves the Router's ability to
471+
- Vertical scaling (specifically increased CPU) improves the Router's ability to
476472
parse incoming line protocol with lower latency.
477473

474+
#### Router latency
475+
476+
Latency of the Router’s write endpoint is directly impacted by:
477+
478+
- Ingester latency--the router calls the Ingester during a client write request
479+
- Catalog latency during schema validation
480+
478481
### Ingester
479482

480-
The Ingester can be scaled both [vertically](#vertical-scaling) and
483+
The [Ingester](/influxdb3/clustered/reference/internals/storage-engine/#ingester) can be scaled both [vertically](#vertical-scaling) and
481484
[horizontally](#horizontal-scaling).
482-
Vertical scaling increases write throughput and is typically the most effective
483-
scaling strategy for the Ingester.
485+
486+
- **Recommended**: Vertical scaling is typically the most effective scaling strategy for the Ingester.
487+
Compared to horizontal scaling, vertical scaling not only increases write throughput but also lessens query, catalog, and compaction overheads as well as Object store costs.
488+
- Horizontal scaling can help distribute write load but comes with additional coordination overhead.
484489

485490
#### Ingester storage volume
486491

@@ -543,37 +548,62 @@ ingesterStorage:
543548

544549
### Querier
545550

546-
The Querier can be scaled both [vertically](#vertical-scaling) and
551+
The [Querier](/influxdb3/clustered/reference/internals/storage-engine/#querier) can be scaled both [vertically](#vertical-scaling) and
547552
[horizontally](#horizontal-scaling).
548-
Horizontal scaling increases query throughput to handle more concurrent queries.
549-
Vertical scaling improves the Querier’s ability to process computationally
550-
intensive queries.
553+
554+
- **Recommended**: [Vertical scaling](#vertical-scaling) improves the Querier's ability to process concurrent or computationally
555+
intensive queries, and increases the effective cache capacity.
556+
- Horizontal scaling increases query throughput to handle more concurrent queries.
557+
Consider horizontal scaling if vertical scaling doesn't adequately address
558+
concurrency demands or reaches the hardware limits of your underlying nodes.
551559

552560
### Compactor
553561

554-
The Compactor can be scaled both [vertically](#vertical-scaling) and
555-
[horizontally](#horizontal-scaling).
556-
Because compaction is a compute-heavy process, vertical scaling (especially
557-
increasing the available CPU) is the most effective scaling strategy for the
558-
Compactor. Horizontal scaling increases compaction throughput, but not as
562+
- **Recommended**: Maintain **1 Compactor pod** and use [vertical scaling](#vertical-scaling) (especially
563+
increasing the available CPU) for the Compactor.
564+
- Because compaction is a compute-heavy process, horizontal scaling increases compaction throughput, but not as
559565
efficiently as vertical scaling.
560566

561567
### Garbage collector
562568

563-
The Garbage collector can be scaled [vertically](#vertical-scaling). It is a
564-
light-weight process that typically doesn't require many system resources, but
565-
if you begin to see high resource consumption on the garbage collector, you can
566-
scale it vertically to address the added workload.
569+
The [Garbage collector](/influxdb3/clustered/reference/internals/storage-engine/#garbage-collector) is a lightweight process that typically doesn't require
570+
significant system resources.
571+
572+
- Don't horizontally scale the Garbage collector; it isn't designed for distributed load.
573+
- Consider [vertical scaling](#vertical-scaling) only if you observe consistently high CPU usage or if the container
574+
regularly runs out of memory.
575+
576+
### Catalog store
567577

568-
### Catalog
578+
The [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store) is a PostgreSQL-compatible database that stores critical metadata for your InfluxDB cluster.
579+
An underprovisioned Catalog store can cause write outages and system-wide performance issues.
569580

570-
Scaling strategies available for the Catalog depend on the PostgreSQL-compatible
571-
database used to run the catalog. All support [vertical scaling](#vertical-scaling).
572-
Most support [horizontal scaling](#horizontal-scaling) for redundancy and failover.
581+
- Scaling strategies depend on your specific PostgreSQL implementation
582+
- All PostgreSQL implementations support [vertical scaling](#vertical-scaling)
583+
- Most implementations support [horizontal scaling](#horizontal-scaling) for improved redundancy and failover
584+
585+
586+
### Catalog service
587+
588+
The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) (iox-shared-catalog statefulset) caches
589+
and manages access to the Catalog store.
590+
591+
- **Recommended**: Maintain **exactly 3 replicas** of the Catalog service for optimal redundancy. Additional replicas are discouraged.
592+
- If performance improvements are needed, use [vertical scaling](#vertical-scaling).
593+
594+
> [!Note]
595+
> #### Managing Catalog components
596+
>
597+
> The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) is managed through the
598+
> `AppInstance` resource, while the [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store)
599+
> is managed separately according to your PostgreSQL implementation.
573600

574601
### Object store
575602

576-
Scaling strategies available for the Object store depend on the underlying
577-
object storage services used to run the object store. Most support
603+
The [Object store](/influxdb3/clustered/reference/internals/storage-engine/#object-store)
604+
contains time series data in Parquet format.
605+
606+
Scaling strategies depend on the underlying object storage services used.
607+
Most services support
578608
[horizontal scaling](#horizontal-scaling) for redundancy, failover, and
579609
increased capacity.

content/influxdb3/clustered/install/_index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,13 @@ Updating your InfluxDB cluster is as simple as re-applying your app-instance wit
6262
6363
The word safely here means being able to redeploy your cluster while still being able to use the tokens you’ve created, and being able to write/query to the database you’ve previously created.
6464
65-
All of the important state in InfluxDB 3 lives in the Catalog (the Postgres equivalent database) and the Object Store (the S3 compatible store). These should be treated with the utmost care.
65+
All of the important state in InfluxDB 3 lives in the Catalog store (the Postgres equivalent database) and the Object Store (the S3 compatible store). These should be treated with the utmost care.
6666
67-
If a full redeploy of your cluster needs to happen, the namespace containing the Influxdb instance can be deleted **_as long as your Catalog and Object Store are not in this namespace_**. Then, the influxdb AppInstance can be redeployed. It is possible the operator may need to be removed and reinstalled. In that case, deleting the namespace that the operator is deployed into and redeploying is acceptable.
67+
If a full redeploy of your cluster needs to happen, the namespace containing the Influxdb instance can be deleted **_as long as your Catalog store and Object Store are not in this namespace_**. Then, the influxdb AppInstance can be redeployed. It is possible the operator may need to be removed and reinstalled. In that case, deleting the namespace that the operator is deployed into and redeploying is acceptable.
6868
6969
### Backing up your data
7070
71-
The Catalog and Object store contain all of the important state for InfluxDB 3. They should be the primary focus of backups. Following the industry standard best practices for your chosen Catalog implementation and Object Store implementation should provide sufficient backups. In our Cloud products, we do daily backups of our Catalog, in addition to automatic snapshots, and we preserve our Object Store files for 100 days after they have been soft-deleted.
71+
The Catalog store and Object store contain all of the important state for InfluxDB 3. They should be the primary focus of backups. Following the industry standard best practices for your chosen Catalog store implementation and Object Store implementation should provide sufficient backups. In our Cloud products, we do daily backups of our Catalog, in addition to automatic snapshots, and we preserve our Object Store files for 100 days after they have been soft-deleted.
7272
7373
### Recovering your data
7474

content/influxdb3/clustered/install/secure-cluster/tls.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ following:
1717

1818
- Ingress to your cluster
1919
- Connection to your Object store
20-
- Connection to your Catalog (PostgreSQL-compatible) database
20+
- Connection to your Catalog store (PostgreSQL-compatible) database
2121

2222
> [!Note]
2323
> If using self-signed certs,
@@ -176,8 +176,8 @@ objectStore:
176176
Refer to your PostreSQL-compatible database provider's documentation for
177177
installing TLS certificates and ensuring secure connections.
178178

179-
If currently using an unsecure connection to your Catalog database, update your
180-
Catalog data source name (DSN) to **remove the `sslmode=disable` query parameter**:
179+
If currently using an unsecure connection to your Catalog store database, update your
180+
Catalog store data source name (DSN) to **remove the `sslmode=disable` query parameter**:
181181

182182
{{% code-callout "\?sslmode=disable" "magenta delete" %}}
183183
```txt

content/influxdb3/clustered/install/set-up-cluster/prerequisites.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ following sizing for {{% product-name %}} components:
9999
{{% tab-content %}}
100100
<!--------------------------------- BEGIN AWS --------------------------------->
101101

102-
- **Catalog (PostgreSQL-compatible database) (x1):**
102+
- **Catalog store (PostgreSQL-compatible database) (x1):**
103103
- _[See below](#postgresql-compatible-database-requirements)_
104104
- **Ingesters and Routers (x3):**
105105
- EC2 m6i.2xlarge (8 CPU, 32 GB RAM)
@@ -116,7 +116,7 @@ following sizing for {{% product-name %}} components:
116116
{{% tab-content %}}
117117
<!--------------------------------- BEGIN GCP --------------------------------->
118118

119-
- **Catalog (PostgreSQL-compatible database) (x1):**
119+
- **Catalog store (PostgreSQL-compatible database) (x1):**
120120
- _[See below](#postgresql-compatible-database-requirements)_
121121
- **Ingesters and Routers (x3):**
122122
- GCE c2-standard-8 (8 CPU, 32 GB RAM)
@@ -133,7 +133,7 @@ following sizing for {{% product-name %}} components:
133133
{{% tab-content %}}
134134
<!-------------------------------- BEGIN Azure -------------------------------->
135135

136-
- **Catalog (PostgreSQL-compatible database) (x1):**
136+
- **Catalog store (PostgreSQL-compatible database) (x1):**
137137
- _[See below](#postgresql-compatible-database-requirements)_
138138
- **Ingesters and Routers (x3):**
139139
- Standard_D8s_v3 (8 CPU, 32 GB RAM)
@@ -150,7 +150,7 @@ following sizing for {{% product-name %}} components:
150150
{{% tab-content %}}
151151
<!------------------------------- BEGIN ON-PREM ------------------------------->
152152

153-
- **Catalog (PostgreSQL-compatible database) (x1):**
153+
- **Catalog store (PostgreSQL-compatible database) (x1):**
154154
- CPU: 4-8 cores
155155
- RAM: 16-32 GB
156156
- **Ingesters and Routers (x3):**

0 commit comments

Comments
 (0)