Skip to content

OBSDOCS-1701: Upgrading to Logging 6 steps Final #93577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: enterprise-4.18
Choose a base branch
from

Conversation

theashiot
Copy link
Contributor

Version(s): 4.18

Issue: OBSDOCS-1701

Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 20, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 20, 2025

@theashiot: This pull request references OBSDOCS-1701 which is a valid jira issue.

In response to this:

Version(s): 4.18

Issue: OBSDOCS-1701

Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2025
Copy link

openshift-ci bot commented May 20, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@theashiot
Copy link
Contributor Author

/test all

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 20, 2025
@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented May 20, 2025


. In the *Change Subscription Update Channel* window, select the latest major version update channel, *stable-6.x*, and click *Save*. Note the `cluster-logging.v6.y.z` version.

. Wait for a few seconds, and then go to *Operators* -> *Installed Operators* to verify that the {clo} version matches the latest `cluster-logging.v5.9.<z>` version.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's being upgraded to Logging 6, then, the version when it's being verified should be v6.y.z and not 5.9.z

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


. On the *Operators* -> *Installed Operators* page, wait for the *Status* field to report *Succeeded*.

. Check if the `LokiStack` custom resource contains the `v13` schema version and add it if it is missing. For correctly adding the `v13` schema version, see "Upgrading the LokiStack storage schema".
Copy link

@r2d2rnd r2d2rnd May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LokiStack has not a place here. The ClusterLogging Operator is not the Red Hat LokiStack operator. They are 2 different operators and upgrading the CLO doesn't upgrade the LokiStack Operator.

This step is for when upgrading the Red Hat Loki Operator only and it should be removed from here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* You have administrator permissions.
* You have access to the {product-title} web console and are viewing the *Administrator* perspective.

.Procedure
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, upgrading to Logging 6, even if using Vector in Logging 5, implies that the checkpoints used are in a different path and this has a big impact. The same reported for the change between Fluentd and Vector reported in https://issues.redhat.com/browse/OBSDA-540 where it's said:

Also, indicate for the migration, that all the logs not compressed will be reprocessed by Vector that can lead to:
  - have duplicated logs in the moment of the migration
  - 429 Too many requests in the Log storage receiving the logs or reaching Rate Limit
  - problems on the log store on disk and performance as consequence of re-reading and processing all old logs the collector
  - impact in the Kube API
  - a peak of memory and cpu in Vector until all the old logs are processed (these logs can be several GB per node). This also could lead to a big impact.

Then, it's provided the steps for moving the Vector checkpoints to the new path before upgrading or it's highlighted this "impact" here as it's reported issues with this upgrade

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @theashiot ,
In the article https://access.redhat.com/articles/7089860 in step "Step 4: Delete the ClusterLogging instance and deploy the ClusterLogForwarder observability Custom Resource", it was incorporated the checkpoints migration. If it's confirmed those steps, then, we could probably incorporate to this PR and close all

[id="logging-upgrading-clo_{context}"]
= Updating the {clo}

To update the {clo} to a new major release version, you must modify the update channel for the Operator subscription.
Copy link

@r2d2rnd r2d2rnd May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not indicated how to create the new CR with the observability API. Upgrading to Logging 6 is not only upgrading the Logging operator. This manual "migration" is what's tried to be explained in https://docs.redhat.com/en/documentation/openshift_container_platform/4.16/html/logging/logging-6-0 points 2.4.5, 2.4.6, 2.4.7...2.4.11, and next.

Basically, what's needed to know before doing any step for upgrading is how to transform the current configuration to the new API in Logging 6. The changes in the configuration between Logging 5 and Logging 6 are detailed in - https://github.com/openshift/cluster-logging-operator/blob/master/docs/administration/upgrade/v6.0_changes.adoc


. On the *Operators* -> *Installed Operators* page, wait for the *Status* field to report *Succeeded*.

. Check if the `LokiStack` custom resource contains the `v13` schema version and add it if it is missing. For correctly adding the `v13` schema version, see "Upgrading the LokiStack storage schema".
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Upgrading the LokiStack storage schema" misses the link to the section

[id="uninstall-es-operator_{context}"]
= Uninstalling Elasticsearch

You can unintall Elasticsearch by using the {product-title} web console.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unintall" has a typo, it's -> uninstall

It should be mentioned that you can uninstall Elasticsearch is it's not used for other component as "Jaeger, Service Mesh or Kiali"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

[id="creating-and-configuring-a-service-account-for-the-log-collector_{context}"]
= Creating and configuring a service account for the log collector

Create a service account for the log collector and assign it the necessary roles and permissions to collect logs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be only needed if not desired to use the legacy account "logcollector". It's explained in https://github.com/openshift/cluster-logging-operator/blob/master/docs/administration/upgrade/v6.0_changes.adoc in section "Legacy openshift-logging"

$ oc create sa logging-collector -n openshift-logging
----

. Bind the `Cluster Role` role to the service account to be able to write the logs to the Red{nbsp}Hat LokiStack
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster Role should be probably all together like "ClusterRole", the same for "service account" as serviceAccount.

Then, probably, it should be like "Bind the ClusterRole to the serviceAccount to be able to write the logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 48 to 49
$ oc adm policy add-cluster-role-to-user collect-infrastructure-log
s -z logging-collector -n openshift-logging
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a single line and without the "s":

 $ oc adm policy add-cluster-role-to-user collect-infrastructure-log -z logging-collector -n openshift-logging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* You installed the {oc-first}.

.Procedure
. Make each step an instruction.
Copy link

@r2d2rnd r2d2rnd May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In article https://access.redhat.com/articles/7089860 step 5 are described cli steps. Needed to verify by engineering.
Needed to highlight that when removed the "old" visualization and until not configured the COO UI Plugin, the Console Log visualization will be lost


. Go to the *Administration* -> *Custom Resource Definitions* page.

. Click the Options menu {kebab} next to *Elasticsearch*, and select *Delete Custom Resource Definition*.
Copy link

@r2d2rnd r2d2rnd May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing to delete the CRD Kibana and also deleting the Elasticsearch PVC


. Click the Options menu {kebab} next to *Elasticsearch*, and select *Delete Custom Resource Definition*.

. Delete the object storage secret.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch has not "object storage secret"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

+
[source,terminal]
----
$ oc delete clusterlogging <CR name> -n <namespace>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, it should good to provide the command to list all before. It's:

$ oc get clusterloggings.logging.openshift.io -A

+
[source,terminal]
----
$ oc get pods -l component=collector -n <namespace>
Copy link

@r2d2rnd r2d2rnd May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do in this way, or we can list in all the namespaces that sounds better as nobody will verify ns by ns:

$ oc get pods -l component=collector -A

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

+
[source,terminal]
----
$ oc get pods -l component=collector -n <namespace>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command is checking again the previous step, if pods are running, but not if a clusterLogForwader CR exists. I'd suggest to use:

$ oc get clusterlogforwarders.logging.openshift.io -A

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

namespace: openshift-logging
spec:
serviceAccount:
name: collector
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the steps is documented to create the serviceAcount "logging-collector" and here, it's set "collector", then, this won't work.

If it's used the legacy serviceAccount, it should be "logcollector", if it's created a new serviceAccount, the name set in the steps here is "logging-collector"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to <Name_of_service_account>

[id="deploying-a-clusterlogforwarder-observability-custom-resource_{context}"]
= Deploying a ClusterLogForwarder observability custom resource

Deploying a ClusterLogForwarder observability custom resource (CR) by using the {oc-first} command.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is only if LokiStack is installed and desired to store the logs on it. If someone is not storing the logs in the Red Hat Log Store, this example is not valid. Then, it's needed to highlight this. Similar to how it's documented in the article https://access.redhat.com/articles/7089860 where it's said:

"Assuming the current stack looks like the below that represents a fully managed OpenShift Logging stack (Vector, Loki) including collection, forwarding and storage."

@theashiot theashiot force-pushed the OBSDOCS-1701-final branch from 256d9ab to c1bc50f Compare May 26, 2025 18:38
@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 26, 2025
@theashiot
Copy link
Contributor Author

/test all

Copy link

openshift-ci bot commented May 26, 2025

@theashiot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/validate-portal c1bc50f link true /test validate-portal

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants