Skip to content

Stream names are not friendly with logging systems (mlflow, wandb, etc.) #364

@tjhunter

Description

@tjhunter

Is your feature request related to a problem? Please describe.

Currently, we allow descriptive stream names such as NPP, ATMS following a known convention in the satellite world. See this example. However, logging systems only want simple names for metrics such as a.b.c. where a is a-zA-Z0-9-_ . It could be more but I would not push because it would break pandas column selection.

As a result, we cannot really have per-stream metrics, which would be useful for example to track how many samples are being ingested per stream. In this case it would be streams.NPP, ATMS.count_samples which would cause issues with WandB or MLFlow.

I see 3 ways forward:

  • magic: conversion of NPP, ATMS to camelCase NppAtms for example
  • convention: only allow names that follow the convention above and return error instead
  • extra info: when defining a stream name, require a user to set an stream_id as a unique machine-ready identifier (for example NPP_ATMS)

In any case, I believe this name should be a-zA-Z0-9_ (no _ to prevent weird issues with databases or pandas)

https://stackoverflow.com/questions/47964380/pandas-dataframe-column-naming-conventions

Do you have any thoughts on that?

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

Metadata

Metadata

Labels

datasetsAnything related to the datasets used in the projectenhancementNew feature or requestgood first issueGood for newcomersinfraIssues related to infrastructure

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions