Skip to content

feat(metrics): add metrics collection mechanism #1071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 60 commits into from
Jul 17, 2025

Conversation

leo-marble
Copy link
Contributor

@leo-marble leo-marble commented Jul 1, 2025

This PR introduces the concept of a metric collector, a new use case that runs periodically to launch all collectors. It introduces the notion of a Collector which has a list of collectors (both global and organization-specific), a collector is a module which implement how to compute/extract a metric.

A versioning system is implemented within the Collector to determine the list of collectors we are likely to receive and to know how to parse the results accordingly.

Now that the mechanism is implemented, the next steps will be to define the list of metrics and implement the collectors in usecases/metrics_collection, following the example of the stub.

This PR also introduces the concept of periodicity to determine if the metric is monthly (calendar-based), daily, or instantaneous. If there is an overlap between two periods, the collector will handle the separation and emit two items. A utility is provided to manage these date notions.

There are still some TODOs remaining, and there are PRs branching from this one to implement these TODOs. Some configuration still needs to be set up.

Copy link

linear bot commented Jul 1, 2025

@leo-marble leo-marble marked this pull request as draft July 1, 2025 16:00
@leo-marble leo-marble force-pushed the mar-1129-define-the-collectdata-logic branch 4 times, most recently from c2b81c4 to 7d678a3 Compare July 2, 2025 16:17
@leo-marble leo-marble force-pushed the mar-1129-define-the-collectdata-logic branch from 555b13b to 0650834 Compare July 3, 2025 11:55
@leo-marble leo-marble self-assigned this Jul 3, 2025
@leo-marble leo-marble requested a review from apognu July 3, 2025 16:28
@leo-marble
Copy link
Contributor Author

cf: #1074 -> Introduce the watermark to bound the time range

@apognu
Copy link
Contributor

apognu commented Jul 7, 2025

Notes from our discussion :

  • Maybe, for collection reports with a license key, check that the license key is valid.
  • Outside the scope of this PR, but think about adding some safeguards, since those are unauthenticated requests (rate limiting on the Load Balancer?).

@leo-marble leo-marble force-pushed the mar-1129-define-the-collectdata-logic branch 2 times, most recently from 2c1f4f2 to f2a0d4a Compare July 8, 2025 12:54
Copy link
Contributor

@Pascal-Delange Pascal-Delange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read some of the PR, though I'm not finished. I'll try to finish tomorrow.

@Pascal-Delange
Copy link
Contributor

Side question btw, you did the dataset and table creation on BQ manually ?

@apognu
Copy link
Contributor

apognu commented Jul 9, 2025

Side question btw, you did the dataset and table creation on BQ manually ?

For now, yes, manually in the sandbox account. Once we end up on a finalized schema, I will write the Terraform for it.

This is what the table might look like after the few changes we mentioned:

create table `marble-dev-sandbox.metrics.events` (
    start_time timestamp not null,
    end_time timestamp not null,
    deployment_id string(36) not null,
    license_key string(36),
    org_id string(36),
    event_type string not null,
    value float64,
    text string,
    collected_at timestamp default current_timestamp()
)
partition by timestamp_trunc(start_time, month)
cluster by deployment_id, license_key, org_id, event_type;

Partition and clustering keys to be discussed, of course.

@leo-marble leo-marble added the enhancement New feature or request label Jul 9, 2025
leo-marble and others added 24 commits July 17, 2025 15:25
…o ensure all org IDs are represented with zero counts if absent
…nd table, and update ingestion repository to use dynamic dataset/table names
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…torsV1 and update related tests and implementations
…eamline watermark management in repositories
…EMETRY and streamline BigQuery initialization for Marble SaaS projects
@leo-marble leo-marble force-pushed the mar-1129-define-the-collectdata-logic branch from 509aa6c to 9658147 Compare July 17, 2025 13:25
…necessary project ID checks and using IsMarbleSaasProject function
Copy link
Contributor

@apognu apognu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge and improve afterwards.

Copy link
Contributor

@Pascal-Delange Pascal-Delange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@leo-marble leo-marble merged commit 842e60a into main Jul 17, 2025
4 checks passed
@leo-marble leo-marble deleted the mar-1129-define-the-collectdata-logic branch July 17, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request go Pull requests that update Go code WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants