Refactors the MetricsManager to improve reliability and monitoring control for metric delivery. #63

saipreetham16 · 2025-06-03T05:29:11Z

Issue Number: 63

Description:

Refactors the MetricsManager to improve reliability and monitoring control for metric delivery.

Before:

Metrics were sent once every 10 seconds, and on failure, retried only once using a boolean shouldRetry.
The retry mechanism was simplistic and could drop metrics easily after a single failure.
Timer management logic was embedded inside monitorAndSendMetrics() with limited lifecycle control.
metricList was reassigned (metricList = mutableListOf()) after a successful send, which could lead to potential race conditions.

After:

Introduced a more robust retry mechanism using retryCount and maxRetries (set to 3).
Failed metrics are re-added to the front of the metricList queue, ensuring they are retried in the next monitoring cycle.
Monitoring lifecycle is now cleanly handled using startMonitoring() and stopMonitoring().
Uses synchronized(metricList) blocks instead of reassigning the list to ensure thread-safe updates.
Introduces constants for initialDelay (1s) and monitoringPeriod (10s) for better configurability and consistency.

These changes enhance the delivery guarantees of metrics, reduce potential for data loss, and improve maintainability and concurrency handling.

Functional backward compatibility:

Does this change introduce backwards incompatible changes? [NO]
Does this change introduce any new dependency? [NO]

Testing:

Is the code unit tested?
NO

List manual testing steps:

Add Steps below:
Added new test metric and validated on CloudWatch metrics.

xiajon · 2025-06-04T21:44:10Z

chat-sdk/src/main/java/com/amazon/connect/chat/sdk/repository/ChatService.kt

            response.id?.let { id ->
                updatePlaceholderMessage(oldId = recentlySentMessage.id, newId = id)
            }
-            true
-        }.onFailure { exception ->
+            metricsManager.addCountMetric(MetricName.SendMessage)


How does this fix the message id issue? This code just moves the metric down and replaces runCatching with try

…ntrol for metric delivery.

xiajon · 2025-06-11T16:24:43Z

chat-sdk/src/main/java/com/amazon/connect/chat/sdk/repository/MetricsManager.kt

-    }
-
-    private fun addMetric(metric: Metric) {
-        if (_isCsmDisabled) {


why removed this?

nvm i see it is added above

saipreetham16 requested a review from a team as a code owner June 3, 2025 05:29

saipreetham16 requested review from swasri and xiajon June 3, 2025 05:29

xiajon reviewed Jun 4, 2025

View reviewed changes

saipreetham16 force-pushed the sleburu/fix_CSM branch from b44f9bb to fc6f16c Compare June 5, 2025 04:50

saipreetham16 changed the title ~~Fix metrics tracking in sendMessage method~~ Fix metrics functions in MetricsManager Jun 5, 2025

saipreetham16 force-pushed the sleburu/fix_CSM branch from fc6f16c to b7ff131 Compare June 10, 2025 05:21

saipreetham16 changed the title ~~Fix metrics functions in MetricsManager~~ Refactors the MetricsManager to improve reliability and monitoring control for metric delivery. Jun 10, 2025

Refactors the MetricsManager to improve reliability and monitoring co…

23abef9

…ntrol for metric delivery.

saipreetham16 force-pushed the sleburu/fix_CSM branch from b7ff131 to 23abef9 Compare June 10, 2025 15:35

xiajon reviewed Jun 11, 2025

View reviewed changes

xiajon approved these changes Jun 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactors the MetricsManager to improve reliability and monitoring control for metric delivery. #63

Refactors the MetricsManager to improve reliability and monitoring control for metric delivery. #63

Uh oh!

saipreetham16 commented Jun 3, 2025 •

edited

Loading

Uh oh!

xiajon Jun 4, 2025

Uh oh!

xiajon Jun 11, 2025

Uh oh!

xiajon Jun 11, 2025

Uh oh!

Uh oh!

Refactors the MetricsManager to improve reliability and monitoring control for metric delivery. #63

Are you sure you want to change the base?

Refactors the MetricsManager to improve reliability and monitoring control for metric delivery. #63

Uh oh!

Conversation

saipreetham16 commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Before:

After:

Functional backward compatibility:

Testing:

Uh oh!

xiajon Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

xiajon Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

xiajon Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saipreetham16 commented Jun 3, 2025 •

edited

Loading