Skip to content

Add support for the Strimzi Metrics Reporter to Kafka brokers/controllers components #11051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

OwenCorrigan76
Copy link
Contributor

@OwenCorrigan76 OwenCorrigan76 commented Jan 16, 2025

Type of change

  • Enhancement / new feature

Description

This patch adds support for the Strimzi Metrics Reporter to Kafka brokers/controllers components as described by the following proposal:

https://github.com/strimzi/proposals/blob/main/064-prometheus-metrics-reporter.md

Related to #10753

Support for Kafka Connect and MirrorMaker2 and Strimzi Kafka Bridge will be added in subsequent PRs.

CEL Validation has been added to the MetricsConfig and CruiseControl API in this PR.
CEL Validation was introduced to Strimzi in this PR:
#11068

We won’t initially support the CruiseControl component. To make it work, CC should be changed to expose metrics through its HTTP endpoint.

Checklist

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md

@OwenCorrigan76 OwenCorrigan76 requested review from a team and mimaison January 16, 2025 17:15
@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 2 times, most recently from e569be2 to 1180ee2 Compare January 16, 2025 17:23
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments after an initial pass. You should also update the names / description / CHANGELOG to make it clear what this really does. Kafka compoenents are brokers/controllers, Connect and MM2. You add this only for brokers/controllers. It should be clear from the CHANGELOG and PR name / desc.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from 1180ee2 to f943007 Compare January 17, 2025 12:00
@OwenCorrigan76
Copy link
Contributor Author

@scholzj I am currently working on the changes I did not comment on yet.

@fvaleri fvaleri added this to the 0.46.0 milestone Jan 17, 2025
@OwenCorrigan76 OwenCorrigan76 changed the title Add support for the Strimzi Metrics Reporter to Kafka component Add support for the Strimzi Metrics Reporter to Kafka brokers/controllers components Jan 17, 2025
@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 2 times, most recently from 84fb016 to 8f5b28f Compare February 13, 2025 11:16
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some more comments. There also seem to be many unaddressed older comments. Not sure if that was intentional or not at this point.

@fvaleri
Copy link
Contributor

fvaleri commented Feb 17, 2025

There also seem to be many unaddressed older comments. Not sure if that was intentional or not at this point.

@scholzj I'm supporting Owen with this feature. He is working on and will eventually address all comments.

@OwenCorrigan76
Copy link
Contributor Author

@scholzj I know progress is going quite slowly. I will address the previous comments shortly. I'm currently working on the CEL validation. When that is committed, I'll go back and tackle the older comments and work through them.

@scholzj
Copy link
Member

scholzj commented Feb 17, 2025

@scholzj I know progress is going quite slowly. I will address the previous comments shortly. I'm currently working on the CEL validation. When that is committed, I'll go back and tackle the older comments and work through them.

No worries Owen, I was just unsure what the expected state is. I did not intend to put any pressure on your. Sorry.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 3 times, most recently from ece3b1f to 75a829f Compare February 24, 2025 13:23
@OwenCorrigan76
Copy link
Contributor Author

The build failed with the following tests:
io.strimzi.api.kafka.model.kafka.KafkaCrdIT.testKafkaWithNullMaintenance
Currently investigating why.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OwenCorrigan76, we still need to do some changes and address the remaining comments, but it works fine.

For the KafkaCrdIT.testKafkaWithNullMaintenance failure, I think we simply need to update the expected error messages removing the invalid: prefix. Now, you get invalid: [, rather than invalid: . With CEL validation rules, we get the additional error message "validation rules were not checked".

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OwenCorrigan76 I was wrong on the "Unsupported metrics type", please correct.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from 4304607 to f88ec2c Compare February 26, 2025 10:51
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good to me for now, but we still have some comments to address.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 2 times, most recently from 3c94f43 to 3969049 Compare March 4, 2025 12:21
@scholzj
Copy link
Member

scholzj commented Apr 23, 2025

I guess this will need a rebase after #11379.

@ppatierno
Copy link
Member

@OwenCorrigan76 before having another review pass, this PR should be rebased, let us know when it's ready again. Thanks!

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from 6af487c to d4a8609 Compare May 2, 2025 08:23
@OwenCorrigan76
Copy link
Contributor Author

Squashed commits for rebasing. I will deal with conflicts now.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from d4a8609 to b89fa8d Compare May 6, 2025 14:07
@fvaleri
Copy link
Contributor

fvaleri commented May 6, 2025

@strimzi/maintainers can one of you trigger the ST pipeline? Thanks.

@ppatierno
Copy link
Member

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@fvaleri
Copy link
Contributor

fvaleri commented May 7, 2025

@ppatierno looks like failing tests hit timeout and seems to be unrelated.

@Frawless
Copy link
Member

Frawless commented May 7, 2025

/packit test --labels regression,upgrade

@Frawless
Copy link
Member

Frawless commented May 7, 2025

TF failures are caused by TF outage. I will restart the jobs once the outage will be resolved.

@Frawless
Copy link
Member

Frawless commented May 7, 2025

/packit test --labels regression,upgrade

@OwenCorrigan76
Copy link
Contributor Author

TF failed again.

@Frawless
Copy link
Member

Frawless commented May 7, 2025

TF failed again.

I am afraid they are still experiencing the outage. In case it is blocking you, feel free to ignore it. I will try to restart it tomorrow and hope everything will be fine that time.

@OwenCorrigan76
Copy link
Contributor Author

This is ready for another review, when you get a chance. Thanks a mil.
@ppatierno @scholzj

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from b89fa8d to b28fde7 Compare May 9, 2025 15:28
@fvaleri fvaleri requested a review from scholzj May 12, 2025 18:02
@@ -471,9 +502,9 @@ public void testNullUserConfiguration() {
}

@ParallelTest
public void testNullUserConfigurationAndCCReporter() {
public void testNullUserConfigurationWithCruiseControlMetricsReporter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you add the new tests at the end rather than to the middle and mess up the diff and GitHub history?

spec:
kafka:
version: 4.0.0
metadataVersion: 4.0-IV0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @param values List configuration values.
*/
public static void createOrAddListConfig(AbstractConfiguration kafkaConfig, String key, String values) {
if (kafkaConfig != null && key != null && !key.isBlank() && values != null && !values.isBlank()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure, but is this missing some else branch?

The Javadc says Append list configuration values or create a new list configuration if missing. -> So I would expect that you need to handle the situation when the things are not set?

Or maybe you know they are always set and do not need this check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OwenCorrigan76 this method should have private visibility.

@scholzj null/blank checks are there because this will become a shared utility and move to ModelUtils a soon as the KC/MM2 PR will be ready. Currently, we simply ignore invalid input, but we could raise and exception or log a warning. Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what PR you're referring to. But I think the comment stands. If you really need to do the check, there really needs to be error handling for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree on adding error handling.

Comment on lines 1033 to 1040
Arrays.stream(existingConfig.split(","))
.map(String::trim)
.filter(s -> !s.isEmpty())
.collect(Collectors.toCollection(LinkedHashSet::new));
Set<String> newValues = Arrays.stream(values.split(","))
.map(String::trim)
.filter(s -> !s.isEmpty())
.collect(Collectors.toCollection(LinkedHashSet::new));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use something like asList(namespacesList.trim().split("\\s*,+\\s*"))? We already use that in ConfigParameterParser to parse namespace list. But to be honest, I have no idea which way would be faster.

Also, why do you use LinkedHashSet? Does the order really matter here? If it does, a comment explaining that might be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scholzj list would be probably faster, but it hardly makes a difference with few configuration values. Once this method becomes a shared utility, values could potentially be user-provided values, so we thought it's nice to preserve ordering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the speed, I was more referring to the splitting than list versus set.

As for the ordering, if it matters, fine. But it should be explained in the comment as I suggest.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from b28fde7 to 22dd7a2 Compare May 13, 2025 14:32
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more nits. But hopefully that will be it.

Comment on lines 1029 to 1031
if (kafkaConfig == null) throw new IllegalArgumentException("Configuration is required");
if (key == null || key.isBlank()) throw new IllegalArgumentException("Configuration key is required");
if (values == null || values.isBlank()) throw new IllegalArgumentException("Configuration values are required");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you properly format it with { / }, it would be more readable and clearer what exactly are you doing there. I do not think we use this form in many parts of the code base, if somewhere at all.

.map(key -> assertThrows(IllegalArgumentException.class, () -> KafkaBrokerConfigurationBuilder.createOrAddListConfig(config, key, "test-value-3")))
.forEach(e -> assertThat(e.getMessage(), is("Configuration key is required")
));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line not needed.

@Frawless
Copy link
Member

/packit test --labels regression,upgrade

This patch adds support for the Strimzi Metrics Reporter to brokers and controllers as described by the following proposal:

https://github.com/strimzi/proposals/blob/main/064-prometheus-metrics-reporter.md

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Squash commits

Address initial feedback

Signed-off-by: ocorriga <ocorriga@redhat.com>

Reverse dashboards and fix build issue

Signed-off-by: ocorriga <ocorriga@redhat.com>

Update derived resources

Signed-off-by: ocorriga <ocorriga@redhat.com>

Trigger build

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

Add MetricsModel interface and refactor affected classes

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Add CEL validation and address comments

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Addressing latest comments

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

add TODO in withStrimziMetricsReporter

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

address latest comments

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Add StrimziMetricsReporterConfig class

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Add defaults for allowlist

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Refactor withUserConfiguration and add tests

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address Comments

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Revert Connect script ENV VAR

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address Comments on common-configuration doc and fix typos

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Fix ConfigMap naming for tests

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address additional Comments

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address Jakub's and Tina's Comments

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Add comments to distinguish between kafka and yammer metrics

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address Paolo's comments

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Remove added comment writer.println

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Refactor behaviour of user config in KafkaBrokerConfigurationBuilder

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address Fede's comments

Signed-off-by: ocorriga <ocorriga@redhat.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address Mickael's comments

Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Fix Kafka dashboard

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Update API doc

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Add system test

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Minor ST improvements

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address Kate's and Paolo's comments

Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Remove jayway-json-path dependency from kafka-thirdparty-libs and add listener.enabed=true

Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address nit and move prometheus-jmx-exporter examples into it's own directory

Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address ST comments

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

Address Jakub's latest comments

Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address comments on KafkaBrokerConfigurationBuilder

Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>

Address comments from Jakub

Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>
Signed-off-by: OwenCorrigan76 <owencorrigan76@gmail.com>
@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from ec69460 to 2215a9b Compare May 14, 2025 10:07
@OwenCorrigan76
Copy link
Contributor Author

Thanks @scholzj Nits addressed

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of nits but overall LGTM! Thanks for the effort @OwenCorrigan76 (and @fvaleri) ;-)

createOrAddListConfig(userConfig, "metric.reporters", CruiseControlMetricsReporter.CRUISE_CONTROL_METRIC_REPORTER);
}
if (injectKafkaJmxReporter) {
createOrAddListConfig(userConfig, "metric.reporters", "org.apache.kafka.common.metrics.JmxReporter");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having a constant for "org.apache.kafka.common.metrics.JmxReporter" within the JmxPrometheusExporterModel class?

}
if (injectStrimziMetricsReporter) {
createOrAddListConfig(userConfig, "metric.reporters", "io.strimzi.kafka.metrics.KafkaPrometheusMetricsReporter");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto as above, maybe a constant for "io.strimzi.kafka.metrics.KafkaPrometheusMetricsReporter" within the StrimziMetricsReporterModel class?

*/
private void maybeAddYammerMetricsReporters(KafkaConfiguration userConfig, boolean injectStrimziMetricsReporter) {
if (injectStrimziMetricsReporter) {
createOrAddListConfig(userConfig, "kafka.metrics.reporters", "io.strimzi.kafka.metrics.YammerPrometheusMetricsReporter");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have a constant for "io.strimzi.kafka.metrics.YammerPrometheusMetricsReporter" within the StrimziMetricsReporterModel class because it's anyway set in related to the Strimzi Metrics Reporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.