Questions about operationalising autokeras on big datasets #1502
Unanswered
robinvanschaik
asked this question in
Q&A
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
For a current project I am training a Tensorflow/Keras model for churn prediction on Google Analytics data from a large dataset stored in BigQuery. As such I am also looking forward to test AutoKeras to this end.
I believe that the current best practice is to export the dataset to Google Cloud Storage in sharded .CSVs, and streaming the shards in using the tf Dataset Interleave generator.
Given that AutoKeras supports tf datasets generators, I believe that it should work with AutoKeras.
However, I still have some questions regarding the inner-workings of AutoKeras before I dive into it.
For instance, I currently make use of the tf.feature_columns API to construct my features and pass them to the Keras DenseFeatures layer.
However, since the dataset is being streamed in batches, I query my Bigquery training dataset at model creation time via utility functions.
This helps me to calculate the min-max, means and standard deviations to pass them to scaling functions of my numeric features. Similarly, I retrieve a list of classes for my categorical features.
How does AutoKeras handle these kinds of transformations when the data is being streamed in batches?
The nice part of the tf.feature.columns API is that you can pass default values when training and at inference.
This helps in production when values might be missing without having to write extensive checks.
For instance like this:
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions