Skip to content

Add prototype of new client #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Add prototype of new client #7

wants to merge 6 commits into from

Conversation

alxhill
Copy link
Contributor

@alxhill alxhill commented Aug 22, 2024

Todo:

  • Change ingest flow to disable direct-write
  • Stop using polars for reading schema
  • Don't assume the first column is always the timeseries one
  • Re-use existing S3 upload
  • Integrate into existing classes
  • ...and much more 🙂


from _api.scout_catalog import CatalogService

catalog = create_service(CatalogService, get_base_url())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably wrap this all in a client or session class so we have fewer globals floating around


# wrappers for conjure service apis

from _api.scout_catalog import CatalogService
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these paths need to change to ._api.combined.<whatever>

(though also can just create a separately installable package for this eventually)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

down to start publishing the conjure apis immediately - conjure python generator produces python packages, just needs some wiring to push to pypi


# todo - merge this with the existing _upload_s3 function, it's mostly copy-pasted wholesale
def _upload_file_s3(self):
if hasattr(self, "s3_path"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attrs can be optional or something, rather than needing to check for presence.

class Dataset:
def __init__(self, csv_path: str):
self.csv_path = csv_path
print("Reading dataset schema...", flush=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can migrate over to logging for these messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree - do you know if it auto-flushes? was v annoying not seeing logs during long-running blocking operations

Comment on lines +341 to +342
if hasattr(self, "run_rid"):
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should definitely log when we skip steps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason I didn't do this is that this will be called recursively - so if you had 10 series in a workbook, it'd call sync on all of them, then each of those would call sync on their respective run (and eventually dataset). so the dataset/run would be uploaded the first time, then skipped for the other 9 series & produce a lot of confusing noise.

if/when we switch to a proper logger, would def add this at debug level or something

Comment on lines +369 to +372
@staticmethod
def from_ch(run: Run, sess, name: str, series_id: str, series_type: str) -> "Series":
base_query = f"SELECT timestamp, value FROM nominal.dataset_{series_type} WHERE series='{series_id}'"
return Series(run, sess, name, base_query, RawNumericSeriesNode(name=name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@staticmethod
def from_ch(run: Run, sess, name: str, series_id: str, series_type: str) -> "Series":
base_query = f"SELECT timestamp, value FROM nominal.dataset_{series_type} WHERE series='{series_id}'"
return Series(run, sess, name, base_query, RawNumericSeriesNode(name=name))
@classmethod
def from_ch(cls, run: Run, sess, name: str, series_id: str, series_type: str) -> "Series":
base_query = f"SELECT timestamp, value FROM nominal.dataset_{series_type} WHERE series='{series_id}'"
return cls(run, sess, name, base_query, RawNumericSeriesNode(name=name))

Comment on lines +84 to +85
data_sources = [
(ds.filename, CreateRunDataSource(data_source=DataSource(dataset=ds.dataset_rid), series_tags={}))
Copy link
Contributor

@alkasm alkasm Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will want the ability to add a custom ref name here, with the filename probably as a fallback



class Dataset:
def __init__(self, csv_path: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets create a bunch of classmethods for alternative ctors, e.g. from_csv() so that we can later have from_pandas(), from_parquet(), etc

@alkasm
Copy link
Contributor

alkasm commented Sep 11, 2024

Closing this PR (but not deleting the branch)

@alkasm alkasm closed this Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants