Skip to content

Refactors the MetricsManager to improve reliability and monitoring control for metric delivery. #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

saipreetham16
Copy link
Contributor

@saipreetham16 saipreetham16 commented Jun 3, 2025

Issue Number: 63

Description:

Refactors the MetricsManager to improve reliability and monitoring control for metric delivery.

Before:

  • Metrics were sent once every 10 seconds, and on failure, retried only once using a boolean shouldRetry.
  • The retry mechanism was simplistic and could drop metrics easily after a single failure.
  • Timer management logic was embedded inside monitorAndSendMetrics() with limited lifecycle control.
  • metricList was reassigned (metricList = mutableListOf()) after a successful send, which could lead to potential race conditions.

After:

  • Introduced a more robust retry mechanism using retryCount and maxRetries (set to 3).
  • Failed metrics are re-added to the front of the metricList queue, ensuring they are retried in the next monitoring cycle.
  • Monitoring lifecycle is now cleanly handled using startMonitoring() and stopMonitoring().
  • Uses synchronized(metricList) blocks instead of reassigning the list to ensure thread-safe updates.
  • Introduces constants for initialDelay (1s) and monitoringPeriod (10s) for better configurability and consistency.

These changes enhance the delivery guarantees of metrics, reduce potential for data loss, and improve maintainability and concurrency handling.


Functional backward compatibility:

Does this change introduce backwards incompatible changes? [NO]
Does this change introduce any new dependency? [NO]


Testing:

Is the code unit tested?
NO

List manual testing steps:

  • Add Steps below:
    Added new test metric and validated on CloudWatch metrics.

@saipreetham16 saipreetham16 requested a review from a team as a code owner June 3, 2025 05:29
@saipreetham16 saipreetham16 requested review from swasri and xiajon June 3, 2025 05:29
response.id?.let { id ->
updatePlaceholderMessage(oldId = recentlySentMessage.id, newId = id)
}
true
}.onFailure { exception ->
metricsManager.addCountMetric(MetricName.SendMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this fix the message id issue? This code just moves the metric down and replaces runCatching with try

@saipreetham16 saipreetham16 changed the title Fix metrics tracking in sendMessage method Fix metrics functions in MetricsManager Jun 5, 2025
@saipreetham16 saipreetham16 changed the title Fix metrics functions in MetricsManager Refactors the MetricsManager to improve reliability and monitoring control for metric delivery. Jun 10, 2025
}

private fun addMetric(metric: Metric) {
if (_isCsmDisabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removed this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm i see it is added above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants